PlanBCode / hypha

1 stars 8 forks source link

Ensures safe e-mail subject #346

Closed laurensmartina closed 3 years ago

laurensmartina commented 3 years ago

Some e-mail providers reject e-mail messages with non-ASCII characters in the subjects.

Fixes #202

matthijskooijman commented 3 years ago

Looks good at first glance. I think it would be good to also link to the relevant spec or RFC in a coment, making it easier to verify the implementation later. Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048

laurensmartina commented 3 years ago

Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048

Implemented the quoted-printable fix

matthijskooijman commented 3 years ago

This is defined in https://tools.ietf.org/rfc/rfc2047.html

It seems like the current implementation looks conformant, except that the quoted_printable_encode PHP function actually implements the RFC2045 version of quoted-printable, which is a bit more complicated (it handles line breaks differently, but there might be other differences that we should technically correct for). Also, RFC2047 poses a requirement on maximum (line) lengths:

While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.

Which corresponds to the max line length requirement of RFC2822, which requires breaking on whitespace in the original text.

I wonder if we should just not use quoted_printable_encode and do our own encoding (which should be simple enough, though I'm not yet sure how to guarantee the max line length).

matthijskooijman commented 3 years ago

I would like to get a solution for this merged. We currently have the older base64-based solution on the Stadsbron server, to at least fix this issue, but having an unmerged commit on the live server is a hassle, so let's move this forward.

It's been a while since I looked at this, but I remember that doing the quoted-printable properly involves implementing it ourselves which is not terribly complicated, but needs a thorough reading of the relevant specs. Also, there was some unclarity about maximum lengths for lines and encoded entities, but I can't recall what exactly (I think there is a badly specified rule that any whitespace between quoted-printable elements is ignored, which allows cutting long encoded elements up into multiple ones with a dummy newline in between, to satisfy the pre-quoted-printable-standard line length limits). In any case, this is more work than I'm prepared to invest right now.

So, I would suggest a compromise:

This hurts the plaintext-reasability of such subjects, but e-mail clients should just handle this transparently, and having subjects that are harder to read outside of a proper mailclient is probaly better than not having mail delivered at all.

I've implemented this proposal in an added commit. @laurensmartina could you see if this looks ok to you? I added a TODO for maybe later implementing the quoted printable, with some references.

If this is ok, I'll squash-merge this PR.