Add a max_size parameter to save and _copy_file

bottlepy / bottle

bottle.py is a fast and simple micro-framework for python web-applications.

http://bottlepy.org/

MIT License

8.33k stars 1.46k forks source link

Add a max_size parameter to save and _copy_file #1326

Open 06180339 opened 3 years ago

06180339 commented 3 years ago

This is a proposal inspired by a requirement I had myself and by the following comment on Stack Overflow: “Bottle stores file uploads in request.files as FileUpload instances, along with some metadata about the upload. FileUpload has a very handy save method that perform the "file slurping" for you. It does not check the maximum file upload though, but IMHO it's better to have nginx in front to perform this kind of limitation check.”

If you don’t use nginx or so and want to have an upper limit for the file size, it would be nice if the save method had an additional parameter max_size to set a maximum size beyond which the writing is stopped.

The changes in this branch hopefully enable that and I wanna propose to merge them.

Jwink3101 commented 3 years ago

I like this idea.

It's super minor, but why use a keyword argument on a private method? It certainly doesn't hurt though and, like I said, this is minor.

06180339 commented 3 years ago

Thank you. That is a good point – the function signature of a private method can be controlled within this library. OTOH, if max_size in _copy_file is made an arg instead of a kwarg, …

either it has to come before chunk_size
or chunk_size itself is also made an arg instead of a kwarg (which it is now).

The former has the downside that then the order in the signature differs from the order in the signature of save. And save itself should probably not be changed.

The latter might have the downside that then the default value of 2 ** 16 has to go – that may be insignificant, but maybe the creator of this method preferred to have a default value ingrained here also, not only in save itself.

So, I am obviously fine with anything that seems acceptable to the maintainers.

06180339 commented 3 years ago

Further possible points of critique may be (in no prioritizing order):

Performance: The difference to the version so far would be the line if max_size is not None per every chunk. That is not nothing, but is probably rather negligeable – if there are a lot of chunks, the time should depend on IO anyway, I guess. A compromise is to have no comparatively costly function call in this line – otherwise if isinstance(max_size, int) comes to mind.
if isinstance(max_size, int) instead of if max_size is not None? See above. Or is it significantly more robust.
If max_size is exceeded, an IOError is thrown – that is also the error for the “File exists”-case. The errors are distinguished by their message each; but it might be better to have a stronger distinction. I am not sure, however, which other Error could or maybe should be used for exceeding the maximum size.

knro commented 2 years ago

This is somewhat related but I couldn't find information on whether there is some hard limitation to file upload size? Can this method support upload for 2GB+ files? This appears to fail in my case on Raspberry PI. So is the current method using RAM? if not, why would it fail on raspberry PI?

defnull commented 2 years ago

Probably because your TMPDIR is an in-memory file system (e.g. tmpfs) and thus limited to your available RAM.

knro commented 2 years ago

It's not, this is on Raspberry PI 16GB SD card and /tmp is just not size restricted as far as I know.

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       14989948 6395892   7963776  45% /
devtmpfs         3787940       0   3787940   0% /dev
tmpfs            3952804    8356   3944448   1% /dev/shm
tmpfs            3952804   34904   3917900   1% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs            3952804       0   3952804   0% /sys/fs/cgroup
/dev/mmcblk0p1    258095   49336    208760  20% /boot
tmpfs             790560       8    790552   1% /run/user/1000

knro commented 2 years ago

oh I see, which of the tmpfs above is used by the Python/Bottle framework? is it the last entry which 790MB available size? this is why it fails? What's the workaround for such a case? Thanks!!

defnull commented 2 years ago

Bottle uses cgi.FieldStorage which uses tempfile.TemporaryFile which in turn stores temp files in the directory returned by https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir

Bottle does not do any in-memory buffering of file uploads. The whole body is buffered, but only for requests smaller than BaseRequest.MEMFILE_MAX which is only 100kb by default. If your temp directory is not size-constrained and you are copying the uploaded file chunk by chunk and not in one step, it should not fail. Please open a new issue with a stack traces.

knro commented 2 years ago

Thanks. Under Linux, gettempdir is indeed /tmp. /tmp does not appear to be mounted to any tmpfs as per the df -h command.