Closed Spiral90210 closed 2 years ago
Well, I made one anyway!
Guessing there's not much appetite fo this, so I'm going to submit another that will allow registering custom decoders, similar to how charsets can be mapped - this way the custom decoder can be in the consuming application, and we can handle things that are broken however we need to so this repo can stay opioniated.
Nah, sorry, I don't like this approach.
That's fair enough, doesn't exactly give me good feelings either but email being email there is all sorts of non spec stuff flying around that is a nightmare to handle (found some microsoft stuff the other day that has a charset of 'unicode' - which turns out to be LE utf-16). Think registering custom decoders would be alright? I'm just about to start testing it now, see if it works ok.
I mean that I don't like the custom decoder approach. I prefer https://github.com/emersion/go-message/pull/152.
Oh dang it I got confused between PR's and issues 🙄 - sorry I've not replied to that one in a while, I've been using it from my fork and it's been working away so I had to switch to other stuff for a while.
The reason I was even suggesting decoders is that I've found another piece of awful content that needs to be handled, and it's not as clear cut as the b64 one. I have some html content (technically malformed html) where the writer seems to have forgotten it's writing QP encoding, terminating the line with = \r\n
(space beteen the = and CRLF). I was taking a similar approach by modifying the existing decoder to be more tolerant of this particular error but although its working ok in my testing, it's a bit hacky and I've not the time to do it properly - I figured this way I'd be able to keep awful code out of the repo!
Its not that they do this every new line, it's like they just... stopped writing their html? It's weird, literally looks like ght=3D"2" width=3D"2" alt= \r\n
, doesn't close the tag it's started, just closing the outer. Nightmare.
Honestly, I'm surprised to see this. Got a really unusual error about being unable to decode a base64 transfer encoded part today. The part in question was an image that was wrapped at 998 chars long but indented with a continuation white space, which breaks the base64 decoder.
This is obviously a bug on the author of the mail part. HOWEVER I went and read the spec, RFC-2045 6.8 () and the following is stated:
So, that's 2 counts that the part is breaking the spec on (line length and invalid characters) BUT I read this paragraph that we should be able to ignore this padding white space, and decode the part.
I believe this can be achieved by imitating what the base64 decoder does internally with it's
newlineFilteringReader
, we can wrap the raw reader in one that will do exactly the same, but include FWS (folding white space) characters in the list, so space and tab characters.Would a PR for this be acceptable?