Using email module to parse multipart insteal of the deprecated cgi module

bottlepy / bottle

bottle.py is a fast and simple micro-framework for python web-applications.

http://bottlepy.org/

MIT License

8.33k stars 1.46k forks source link

Using email module to parse multipart insteal of the deprecated cgi module #1437

Open aisk opened 7 months ago

aisk commented 7 months ago

fix: #1403

Since cgi will be removed, the Python change log recommends to using email.message or the PYPI package multipart, and bottle does not allow to use external dependencies, and vendoring multipart is not a good practice, so I think the email package is a better way.

I don't check too much about the compatibilities, if some maintainer think this way is okay, I'll invest more time to do it. But the test_multipart passed on my local machine (some other tests failed because I'm using Windows and they failed in the master branch).

defnull commented 7 months ago

Unfortunately, all data parsed by email.parser.FeedParser will end up in memory buffered Messages. Uploading large files (or many small ones) would likely trigger MemoryError on a busy server. The parser needs a way to offload large file uploads into temporary files to be useful in a web context. Not sure if the email.parser package supports that use case.

aisk commented 7 months ago

The email.parser.FeedParser has an optional argument _factory, which can specify which Message class will be used in the parsed result. So we can subclass the email.message.Message, and override the set_payload and any other methods to offload the large file to the disk.

I didn't take too much to see if this will work, if you didn't check this too, I want to investigate on it.

defnull commented 7 months ago

Does not really help for large uploads, as those are still collected as a list of strings in memory before set_payload is even called. That behavior is hard-coded in the parser. The parser is also string-based, binary data is passed in as data.decode('ascii', 'surrogateescape') and copied multiple times. It was designed for emails (where you need an error tolerant and lax parser) and not for the internet (where you need a fast and strict parser that bails immediately if it sees something fishy). I would love to use that parser, that would be my first choice if that was an option. But I do not think it is suitable for this use case.

aisk commented 7 months ago

Thanks for the kindly reply, I've got the point!