RFC 2046 section 5.1.1 (incorporated by reference into RFC 7578, and not
overridden) says:
There appears to be room for additional information prior to the
first boundary delimiter line and following the final boundary
delimiter line. These areas should generally be left blank, and
implementations must ignore anything that appears before the first
boundary delimiter line or after the last one.
I haven't so far found it a major practical problem for multipart to
forbid data after the final boundary, because one can always run
parse_form_data with its default of strict=False. However, this tactic
doesn't work for data before the first boundary, because
MultipartParser._iterparse raises MultipartError before doing any
parsing.
I've run into this problem in a real application, namely when testing
Launchpad with my modified zope.publisher that uses multipart.
Launchpad's official Python API client library (launchpadlib) uses
wadllib to construct multipart/form-data representations of its
requests; this originally used email.mime to assemble the message, and
while it now uses its own implementation instead, it still inherits an
oddity from email.mime.multipart.MIMEMultipart in that it writes what
look like MIME headers before the first boundary. As far as I can tell
from RFC 2046, these aren't actually parsed as headers and are just an
ignored preamble, and indeed cgi.FieldStorage treats them as such, but
multipart rejects them.
While I intend to fix wadllib to not write this useless preamble, it's
widely deployed in the wild and so Launchpad is going to need to support
old versions of it for some time; it's hard for any layer above the
multipart parser to work around this, because any such layer would have
to reimplement parts of the parser itself and mangle the input stream
somehow. It would therefore be very helpful for multipart to permit
such preambles.
This is much the same as #25, but against a Python 2 maintenance branch.
RFC 2046 section 5.1.1 (incorporated by reference into RFC 7578, and not overridden) says:
There appears to be room for additional information prior to the first boundary delimiter line and following the final boundary delimiter line. These areas should generally be left blank, and implementations must ignore anything that appears before the first boundary delimiter line or after the last one.
I haven't so far found it a major practical problem for multipart to forbid data after the final boundary, because one can always run parse_form_data with its default of strict=False. However, this tactic doesn't work for data before the first boundary, because MultipartParser._iterparse raises MultipartError before doing any parsing.
I've run into this problem in a real application, namely when testing Launchpad with my modified zope.publisher that uses multipart. Launchpad's official Python API client library (launchpadlib) uses wadllib to construct multipart/form-data representations of its requests; this originally used email.mime to assemble the message, and while it now uses its own implementation instead, it still inherits an oddity from email.mime.multipart.MIMEMultipart in that it writes what look like MIME headers before the first boundary. As far as I can tell from RFC 2046, these aren't actually parsed as headers and are just an ignored preamble, and indeed cgi.FieldStorage treats them as such, but multipart rejects them.
While I intend to fix wadllib to not write this useless preamble, it's widely deployed in the wild and so Launchpad is going to need to support old versions of it for some time; it's hard for any layer above the multipart parser to work around this, because any such layer would have to reimplement parts of the parser itself and mangle the input stream somehow. It would therefore be very helpful for multipart to permit such preambles.
This is much the same as #25, but against a Python 2 maintenance branch.