Closed exander77 closed 3 years ago
This is the kind of thing that we typically just try to make work anyway despite technically not being RFC compliant in we can--the whole "be relaxed in what you accept and strict in what you send" mentality.
Yep, I'm able to reproduce with a message exported from tutanota.
The error happens inside the go-message
library's function here. The method is private to the library so any fixes would have to be upstreamed or made in a forked version of the library.
We might be able to do some preprocessing on the parsed header values, decoding any encoded words in things like attachment filenames before making calls to the library methods.
Preprocessing is also difficult. The go method mime.ParseMediaType(...)
is unable to handle these encoded media type parameters at all and gives up when it sees them. It still returns the media type itself (in this case, attachment/pdf
and attachment
) despite the error, but doesn't return any of the media type parameters (in this case, name
and filename
), meaning we can't go and decode the encoded filename ourselves.
Options:
mime.ParseMediaType(...)
mime.ParseMediaType(...)
returns the media type despite the error)go-message
's methods to get the content type/disposition@bartbutler @jameshoulahan
I am processing around 150k more emails and I have found more of this. So this is definitely out there. Probably some services and some email clients do this. The server probably handles it already as I think I have received emails from Tutanota without any problems.
Yes, it is deep in the libs and it is not easily fixable there either as you want to probably use the WordDecoder you have.
This is related to:
Content-Type
in attachment, there is a workaround for the message header there, but is not used for attachments, this charset=binary; charset=UTF-8;
causes duplicate parameter name
error if it is in attachment header,changeEncodingAndKeepLastParamDefinition
hack for handling: ParseMediaType from MIME doesn't support RFC2231 for non asci / utf8 encodings so we have to pre-parse it.So maybe writing a parser for Content-Type
and Content-Disposition
would solve all of these.
with the new version of Bridge (1.8.5) the majority of message parsing in delegated to the backed which is now handling this properly. as for the I-E app, we won't be rewriting the parser at this stage - Import Assistant will do the job.
@andrzejsza Should I try to re-migrate my e-mails with Import Assistant?
yes, I'd suggest you do. Bridge is now using, for most parts, the same parser as the IA (for the imports).
Just to clarify what @andrzejsza means: when import messages, bridge now simply iterates through the mime structure, encrypting each body in place, before handing it over to the same serverside parser that is used by the import assistant and to parse incoming mail. So the promise of e2ee is still kept as all bodies are locally encrypted. The serverside parser is in general much more forgiving when it comes to weird edge cases (like we have here, with rfc2047-encoded attachment filenames).
So I finally tracked down the issue of
invalid media parameter
. At least one instance is caused by:This is completely against RFC2047:
But it, for example, covers all messages sent from Tutanota which includes an attachment. So currently it means that people can't migrate from Tutanota to ProtonMail at all. Which is not really good from the business standpoint as Tutanota is one of the ProtonMail's biggest competitors.
@jameshoulahan Any thoughts?