encode / starlette

The little ASGI framework that shines. 🌟
https://www.starlette.io/
BSD 3-Clause "New" or "Revised" License
10.18k stars 920 forks source link

UploadFile hogs RAM #579

Closed haizaar closed 5 years ago

haizaar commented 5 years ago

Good day,

We are using FastAPI to access large (1GB) files through starlette UploadFile and we have noticed that all the file is loaded into RAM.

Looking at the code I see that starlette uses SpooledTemporaryFile, however it initializes it [1] with default parameters, namely max_size=0. The thing is that SpooledTemporaryFile.write() methods calls self._check() to switch to disk IO, however with max_size=0 it never does it [2] and hence the whole file is loaded to RAM.

I suggest to starlette's UploadFile to:

What do you think?

[1] https://github.com/encode/starlette/blob/594b95c1da8c4ed2aff5629ab5ce128a18b6931b/starlette/datastructures.py#L428

[2] https://github.com/python/cpython/blob/master/Lib/tempfile.py#L650

tomchristie commented 5 years ago

Looking at the code I see that starlette uses SpooledTemporaryFile, however it initializes it [1] with default parameters, namely max_size=0.

Gotcha - good digging.

Let's start out by putting a sensible max_size figure on that. Perhaps 20MB or something. Ideally take a look at Flask or Django and see what defaults they've gone for.

We can consider further API for customizing that at a later date - doesn't really need to happy now so long as we start out with a sensible default.

Thanks!

haizaar commented 5 years ago

I was thinking about something like 100k :) Anyhow, any buffer would help tremendously.

Indeed it will be great to have a way to configure it, even on some starlette-global level, because currently there are now clean methods to adjust it. Though we found couple of ugly ones here: https://github.com/tiangolo/fastapi/issues/390#issuecomment-511766926

tomchristie commented 5 years ago

Django's default is 2.5MB.

https://docs.djangoproject.com/en/2.2/ref/settings/#file-upload-max-memory-size

Given that, and your comment I'd suggest we just go for 1MB as a nice simple figure. Want to issue a pull request adding that?

haizaar commented 5 years ago

Flask (werkzeug) uses 500k by default it seems: https://github.com/pallets/werkzeug/blob/master/src/werkzeug/formparser.py#L56

I can submit a one-line PR, but creating a test for it a bit challenging... Will you accept a PR without test for this?

I can think of creating an upload, interrupting it half-way and then checking process RAM compared to the moment I start upload, but it's far less trivial than the fix itself...

tomchristie commented 5 years ago

Flask (werkzeug) uses 500k by default it seems

Okay, so 1MB would be a good midway point between that and Django's default.

Will you accept a PR without test for this?

Yup for sure - it's just exercising a standard bit of Python stdlib, so not an issue at all.

haizaar commented 5 years ago

PR ready.

gofmannir commented 1 month ago

Does it still relevant issue? because now in the docs I see it users BinaryIO, is it still initialized to the SpooledTemporaryFile class ?