defnull / multipart

A fast multipart/form-data parser for python
https://multipart.readthedocs.io/
MIT License
135 stars 33 forks source link

Allow stream to contain data before first boundary #25

Closed cjwatson closed 4 years ago

cjwatson commented 4 years ago

RFC 2046 section 5.1.1 (incorporated by reference into RFC 7578, and not overridden) says:

There appears to be room for additional information prior to the first boundary delimiter line and following the final boundary delimiter line. These areas should generally be left blank, and implementations must ignore anything that appears before the first boundary delimiter line or after the last one.

I haven't so far found it a major practical problem for multipart to forbid data after the final boundary, because one can always run parse_form_data with its default of strict=False. However, this tactic doesn't work for data before the first boundary, because MultipartParser._iterparse raises MultipartError before doing any parsing.

I've run into this problem in a real application, namely when testing Launchpad with my modified zope.publisher that uses multipart. Launchpad's official Python API client library (launchpadlib) uses wadllib to construct multipart/form-data representations of its requests; this originally used email.mime to assemble the message, and while it now uses its own implementation instead, it still inherits an oddity from email.mime.multipart.MIMEMultipart in that it writes what look like MIME headers before the first boundary. As far as I can tell from RFC 2046, these aren't actually parsed as headers and are just an ignored preamble, and indeed cgi.FieldStorage treats them as such, but multipart rejects them.

While I intend to fix wadllib to not write this useless preamble, it's widely deployed in the wild and so Launchpad is going to need to support old versions of it for some time; it's hard for any layer above the multipart parser to work around this, because any such layer would have to reimplement parts of the parser itself and mangle the input stream somehow. It would therefore be very helpful for multipart to permit such preambles.

cjwatson commented 4 years ago

If you approve of this PR, it would also be very helpful if there might be some way to release a multipart 0.1.1 with a backport of this fix; zope.publisher still supports Python 2, and my application (Launchpad) is unfortunately still on Python 2 (porting to multipart is part of our long dependency chain for porting to Python 3, which is why I care about this). I realise maintaining multiple branches is extra work; I'd be happy to provide a similar PR if there were a 0.1 maintenance branch, to try to help make it as easy as possible.

cjwatson commented 4 years ago

@defnull Hi - have you had a chance to look at this at all? I'd love to be able to move forward with the zope.publisher work, and this is a blocker for that (along with the py2 maintenance branch question I asked).

defnull commented 4 years ago

Would you be interested in maintaining this project with me? I am obviously not very responsive at the moment and your proposals are all very well thought out.

cjwatson commented 4 years ago

Sure, I'm happy to help out if it would be useful.

cjwatson commented 4 years ago

... ping? :-) Happy to help out, but I guess I'd need permissions of some kind.