dom96 / jester

A sinatra-like web framework for Nim.
MIT License
1.58k stars 120 forks source link

Files are not streamed, loaded directly into memory #181

Open jivank opened 5 years ago

jivank commented 5 years ago

On a 512MB RAM VPS. Files are being served from the static directory.

root@avocado:~/ubuntu# ../simplefilemanager
INFO Jester is making jokes at http://0.0.0.0:5000
Starting 1 threads
DEBUG GET /
DEBUG   200 OK {"content-type": @["text/html;charset=utf-8"]}
DEBUG GET /ubuntu-18.04.1-desktop-amd64.iso
DEBUG   200 OK {"content-length": @["1953349632"], "content-type": @["application/x-iso9660-image"]}
Killed

I can see the memory to start being allocated by the process and the download dialog never comes up since the process gets killed. In comparison, a download dialog will immediately come up withpython -m SimpleHTTPServer

dom96 commented 5 years ago

It should get streamed though: https://github.com/dom96/jester/blob/master/jester.nim#L203.

Perhaps it's a leak in Nim's GC/stdlib.

jivank commented 5 years ago

Correct me if I am wrong.

        # Let `readToStream` write file data into fileStream in the
        # background.
        asyncCheck file.readToStream(fileStream)
        # The `writeFromStream` proc will complete once all the data in the
        # `bodyStream` has been written to the file.

will read from disk asynchronously and populate FutureStream. But the FutureStream isn't being consumed as shown by the readToStream implementation.

https://github.com/nim-lang/Nim/blob/master/lib/pure/asyncfile.nim#L522

proc readToStream*(f: AsyncFile, fs: FutureStream[string]) {.async.} =
  ## Writes data to the specified future stream as the file is read.
  while true:
    let data = await read(f, 4000)
    if data.len == 0:
      break
    await fs.write(data)

fs.complete()

So I wrote a snippet that seems to use 6MB max of memory. Instead of writing to the stream I write to the socket.

import asynchttpserver, asyncdispatch, asyncnet, asyncfile, strutils, ospaths, strformat

var server = newAsyncHttpServer()
proc cb(req: Request) {.async.} =
  let pathToFile = "/Users/jivan/Downloads/somelargefile.zip"
  let fileName = pathToFile.split(PathSep)[^1]
  var file = openAsync(pathToFile, fmRead)
  await req.client.send("HTTP/1.1 200 OK\r\n")
  await req.client.send("Content-Length: " & $file.getFileSize() & "\r\n")
  await req.client.send("Content-Type: application/force-download\r\n")
  await req.client.send(fmt"""Content-Disposition: attachment; filename="{fileName}"""" & "\r\n\r\n")
  while true:
    let data = await read(file, 10000)
    if data.len == 0:
      break
    await req.client.send(data)

  req.client.close()
  file.close()

waitFor server.serve(Port(8080), cb)

So again correct me if I am wrong but it appears that if you have a 2GB file for example, fileStream will be fully loaded with 2GB of content, then it will proceed to read the stream.

dom96 commented 5 years ago

The code immediately below the line you've quoted reads from the file stream. The readToStream operation isn't awaited so it does it's thing concurrently.

Note that the async future streams do have some issues and I do plan to reimplement them, see here for more details: https://github.com/nim-lang/Nim/issues/7126

jivank commented 5 years ago

Alright, correct me if I'm wrong or if I misunderstood something. Even if it is concurrent, if you have a fast disk and slow client, readToStream doesn't wait for the client to catch up. Wouldn't you want the stream to have a max size limit so readToStream will pause until something consumes the stream?

dom96 commented 5 years ago

Yeah, I think you're right. That might be the problem here.

Araq commented 5 years ago

Perhaps it's a leak in Nim's GC/stdlib.

There was a codegen bug but it is fixed in devel.

jivank commented 5 years ago

For reference, Rosencrantz handles this with the following implementation https://github.com/andreaferretti/rosencrantz/blob/master/rosencrantz/staticsupport.nim#L33

al6x commented 3 years ago

+1, the async file IO should use some sort of back pressure and stop reading chunks of data from disk if the socket is slower to consume it.

P.S.

Why files < 10Mb read with sync? Wouldn't it be better to server everything with async?