apache / incubator-ponymail

Apache Pony Mail (Incubating) - Email for Ponies & People
http://ponymail.incubator.apache.org/
Other
80 stars 30 forks source link

Bug: email parser mishandles old-style boundaries #519

Open sebbASF opened 4 years ago

sebbASF commented 4 years ago

The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:

Content-Type: multipart/mixed; boundary="<<001-3e1dcd5a-119e>>"

Once parsed, the boundary becomes "<001-3e1dcd5a-119e>" which does not match.

There are two bugs for this: https://bugs.python.org/issue28945 https://bugs.python.org/issue29020 but unfortunately no fix in sight.

It's possible to monkey-patch the library by providing a replacement copy of the method email.utils.collapse_rfc2231_value.

It might make sense to add this as an option (at least initially) for the importer so that missing messages could be imported.

Attached is some test code to demonstrate the fix.

parse_email.py.zip