defnull / multipart

Multipart parser for Python 3
Other
120 stars 33 forks source link

Allow stream to contain data before first boundary #28

Closed cjwatson closed 3 years ago

cjwatson commented 3 years ago

RFC 2046 section 5.1.1 (incorporated by reference into RFC 7578, and not overridden) says:

There appears to be room for additional information prior to the first boundary delimiter line and following the final boundary delimiter line. These areas should generally be left blank, and implementations must ignore anything that appears before the first boundary delimiter line or after the last one.

I haven't so far found it a major practical problem for multipart to forbid data after the final boundary, because one can always run parse_form_data with its default of strict=False. However, this tactic doesn't work for data before the first boundary, because MultipartParser._iterparse raises MultipartError before doing any parsing.

I've run into this problem in a real application, namely when testing Launchpad with my modified zope.publisher that uses multipart. Launchpad's official Python API client library (launchpadlib) uses wadllib to construct multipart/form-data representations of its requests; this originally used email.mime to assemble the message, and while it now uses its own implementation instead, it still inherits an oddity from email.mime.multipart.MIMEMultipart in that it writes what look like MIME headers before the first boundary. As far as I can tell from RFC 2046, these aren't actually parsed as headers and are just an ignored preamble, and indeed cgi.FieldStorage treats them as such, but multipart rejects them.

While I intend to fix wadllib to not write this useless preamble, it's widely deployed in the wild and so Launchpad is going to need to support old versions of it for some time; it's hard for any layer above the multipart parser to work around this, because any such layer would have to reimplement parts of the parser itself and mangle the input stream somehow. It would therefore be very helpful for multipart to permit such preambles.

This is much the same as #25, but against a Python 2 maintenance branch.