djcb / mu

maildir indexer/searcher + emacs mail client + guile bindings
http://www.djcbsoftware.nl/code/mu
GNU General Public License v3.0
1.61k stars 389 forks source link

Work around broken encodings in received messages #2700

Closed flexibeast closed 2 months ago

flexibeast commented 5 months ago

Context: mu+mu4e 1.12.4 running on Emacs 29.3, both installed via Portage (net-mail/mu and app-editors/emacs) on Gentoo.

i recently received an email whose raw subject line is:

Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=

This subject line is correctly displayed in mu4e:view mode, as:

We’ve reconnected – and next steps

but in mu4e:headers mode, it's displayed as:

We\222ve reconnected \226 and next steps

i.e. the byte sequence:

We’ve reconnected – and next steps

This appears to be the result of the rfc2047-decode-region function, or some equivalent, not being run on the text; running that function on it results in correct display.

Checklist

djcb commented 5 months ago

Can you attach a message file (anonymized as needed) where this happens? Thanks.

flexibeast commented 5 months ago

The only such email i have is the one i received today, which contains personal health information. i've redacted the bodies (i.e. the two MIME parts) basically in their entirety, and also redacted various bits of header content in a minimal way, hopefully still leaving it usable.

email.txt

djcb commented 5 months ago

Emacs is just showing what it gets from the mu-server, it doesn't decode anything in the headers buffer. Looking in a message (where it's shown as expected, with M-x describe-char I get:

   character: ’ (displayed as ’) (codepoint 8217, #o20031, #x2019)
              charset: windows-1252 (WINDOWS-1252 (Latin I))

so the problem seems to be that the original message uses the window-1252 charset, but claimed it was ISO-8859-1:

Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=

you can see that if you'd change the subject to

Subject: =?WINDOWS-1252?Q?We=92ve_reconnected_=96_and_next_steps?=

it will show correctly (after re-indexing etc.).

djcb commented 5 months ago

Now, while it's the message's sender that's misbehaving, that won't help us very much.

mu can't easily do with gnus mail does (we're bound by GMime), but I'll turn this into an RFE ticket and see if we can find a work-around.

flexibeast commented 5 months ago

Ah, great analysis, thank you. i think i had indeed run describe-char and noticed that 1252 was mentioned, but it didn't click that this was not what the "Subject" header was claiming ....

Thanks for converting this to an RFE. 👍 i'm going to try emailing postmaster@salesforce about this, which is probably unlikely to result in any change, but at least i'll have tried. 😛

flexibeast commented 4 months ago

It's just come to my attention that the HTML5 spec says that "ISO-8859-1" is to be interpreted as Windows-1252. So presumably what's happening in this email is that it's assumed it will be read in a Web-based client - which, to be fair, is more than likely the case - such that the HTML5 spec is applicable. Which, fwiw, feels incorrect to me: even if the email body contains only a text/html MIME part, the headers are certainly not HTML.

djcb commented 2 months ago

I'm moving this to the IDEAS.org file and close it here shortly... would probably best be solved at the GMime level.