gen-smtp / gen_smtp

The extensible Erlang SMTP client and server library.
Other
683 stars 266 forks source link

Bare LF #253

Closed maennchen closed 3 years ago

maennchen commented 3 years ago

When sending emails to some SMTP servers, a result like this is replied:

{:error, :send, {:temporary_failure, 'cbx.virusfree.cz', "451 See http://pobox.com/~djb/docs/smtplf.html.\r\n"}}

You can't get mail through to msn.com and thousands of other systems around the Internet. Your mailer is violating 822bis section 2.3, which specifically prohibits all bare LFs.

I'll try to find out exactly what message is causing this behavior. This is however not that easy since I do not get the message logged together with the error at the moment.

seriyps commented 3 years ago

do you generate email payload with mimemail, or it comes from somewhere else?

maennchen commented 3 years ago

@seriyps I generate the email with mimemail.

:mimemail.encode(
  {"text", "plain",
    [
      {"Subject", Changeset.fetch_field!(changeset, :subject)},
      {"From", :smtp_util.combine_rfc822_addresses([{from_name, from_email}])},
      {"To",
      :smtp_util.combine_rfc822_addresses([
        {to_name, Changeset.fetch_field!(changeset, :recipient)}
      ])},
      {"Auto-submitted", "yes"},
      {"X-Auto-Response-Suppress", "All"},
      {"Precedence", "auto_reply"},
      {"Date", Calendar.strftime(DateTime.utc_now(), "%a, %-d %b %Y %X %z")},
      {"Message-ID", Changeset.fetch_field!(changeset, :uuid)},
      {"Content-Type", "text/plain; charset=utf-8"},
      {"Content-Transfer-Encoding", "7bit"}
    ], [], Changeset.fetch_field!(changeset, :body)},
  encoding_options
)
seriyps commented 3 years ago

Interesting. I did some extra testing in #247 and I see that mimemail indeed is able to generate results with "bare newlines" (\n without leading \r). Few examples:

{<<"multipart">>,<<"mixed">>,
 [{<<"From">>,<<"test@example.com">>}],
 #{},
 [{<<"text">>,<<"html">>,[],
   #{content_type_params => [{<<"charset">>,<<"utf-8">>}],
     disposition => <<"inline">>},
   <<"<!doctype html><html><body><p>\n</p></body></html>">>}]}.

{<<"text">>,<<"plain">>,
 [{<<"From">>,<<"test@example.com">>}],
 #{},<<"\n">>}

{<<"multipart">>,<<"mixed">>,
 [{<<"From">>,<<"test@example.com">>}],
 #{},
 [{<<"text">>,<<"plain">>,[],#{},<<"\n">>}]}

I'm not sure however if it's the task of mimemail or gen_smtp_client to escape them. Does anyone know if there are any RFC about this?

mworrell commented 3 years ago

We are in good company allowing bare line feeds in emails

https://docs.microsoft.com/en-us/exchange/mail-flow-best-practices/non-delivery-reports-in-exchange-online/fix-error-code-550-5-6-11-in-exchange-online

As messages can be signed with DKIM we should not change the content of the email before sending it.

I have to check my email routines, but I don't think we filter these newlines. And in the millions of emails sent we never encountered any problem.

Microsoft suggests that the email server complaining about the bare line feeds should be upgraded. Which they better do soon, as apparently Microsoft started sending bare line feeds as well.

mworrell commented 3 years ago

On a related note, here is the BDAT RFC.

We should support using BDAT when sending email. I am not at my machine right now, so can't check the code.

https://tools.ietf.org/html/rfc3030

seriyps commented 3 years ago

https://tools.ietf.org/html/rfc2045#section-2.7 here it's mentioned that bare \n is not allowed in 7bit and 8bit "content-transfer-encoding" formats; here it mentions bare \n s as well: https://tools.ietf.org/html/rfc2045#section-6.7

seriyps commented 3 years ago

@mworrell it seems @maennchen is trying to send email using gen_smtp_client and receiving server rejects this email because it contains \n without preceeding \r (so, it matches this regexp: re:run(BinMime, "[^\r]\n")). And seems RFC-2045 forbids such payloads in DATA

mworrell commented 3 years ago

Just checked the Zotonic source code, and there we make sure that all lone-LF and CR characters are made into CR LF. After that the message is signed with DKIM and sent as quoted-printable (if I remember correctly).

seriyps commented 3 years ago

I don't think we filter these newlines.

it might be that we don't have to filter them in gen_smtp_client, but rather tweak "content-transfer-encoding" to encode \n with quoted-printable or maybe base64. Right now we would pass bare \n as is, if the whole payload (actualy, if first 200 bytes) are all ascii and \r or \n:

https://github.com/gen-smtp/gen_smtp/blob/410557a4b52cbabc99062e2196fa0ec16c3c03a7/src/mimemail.erl#L743-L750

seriyps commented 3 years ago

Yep, it seems mimemail only produces bare \ns when it chooses 7bit as encoding type. Because mimemail:encode_quoted_printable/1 encodes \n as =0A always; base64 encodes everything always.

So, I guess what happens is:

All first 200 characters of email body are in 31-127 + \r + \n range, so, mimemail chooses 7bit encoding for them (which means send it as is). And here I see 2 problems:

  1. What if we have characters out of 7bit ascii after the first 200 characters?
  2. As mentioned, bare \n are not allowed by RFC-2045, so we should not send them as 7bit encoding

What I propose: only choose 7bit encoding when WHOLE body is in allowed range, not just 1st 200 characters. And this allowed range should exclude bare newline \n or bare carriage return \r.

And inside mimemail all this encoding of course happens BEFORE the DKIM signing. You sign already "7bit"/"quoted-printable"/"base64"-encoded data.

2.7. 7bit Data

"7bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]. No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0). CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

seriyps commented 3 years ago

I'll make a fix

seriyps commented 3 years ago

Oh, it happened to be not that easy! I keep finding some tricky cases. Will try more tomorrow.

mworrell commented 3 years ago

In Zotonic we have this routine fixing the lone CR and LF characters:

% Make sure that loose \n characters are expanded to \r\n
expand_cr(B) -> expand_cr(B, <<>>).

expand_cr(<<>>, Acc) -> Acc;
expand_cr(<<13, 10, R/binary>>, Acc) -> expand_cr(R, <<Acc/binary, 13, 10>>);
expand_cr(<<10, R/binary>>, Acc) -> expand_cr(R, <<Acc/binary, 13, 10>>);
expand_cr(<<13, R/binary>>, Acc) -> expand_cr(R, <<Acc/binary, 13, 10>>);
expand_cr(<<C, R/binary>>, Acc) -> expand_cr(R, <<Acc/binary, C>>).

The parts with the text and/or HTML have the following basic headers:

        {<<"content-type-params">>, [ {<<"charset">>, <<"utf-8">>} ]},
        {<<"disposition">>, <<"inline">>},
        {<<"transfer-encoding">>, <<"quoted-printable">>},
        {<<"disposition-params">>, []}

After all parts are collected we encode the email, passing the DKIM settings:

mimemail:encode({<<"multipart">>, <<"alternative">>, Headers1, [], Parts1}, opt_dkim(Context))
seriyps commented 3 years ago

@mworrell yes, that makes quite a lot of sense, but you probably don't want to run expand_cr over, say, some binary attachment. So, I'm trying to build some fully automatic way by tweaking guess_best_encoding function that would work equally well for, say, ASCII and UTF-8 plaintext as well as binary attachments. But found some corner cases in multipart messages. I'll try to cover them bit later today.

mworrell commented 3 years ago

Attachments are base64 encoded - as they should be. So the CR/LF expand is only needed for text parts (text and html parts).

I think in general clients should send attachments as-is en base64 encoded. Only the readable text of the message should be in text.

seriyps commented 3 years ago

@maennchen this, hopefully, should be fixed now.

maennchen commented 3 years ago

@seriyps Thanks a lot for your work :heart: