DockYard / elixir-mail

Build composable mail messages
416 stars 67 forks source link

Parsing removes newline(\n) characters #138

Closed martin-jahn closed 2 years ago

martin-jahn commented 2 years ago

Version

0.2.3

Test Case

eml = "Delivered-To: company@example.com\nReceived: by 2002:a05:6520:444e:b0:157:ec88:5685 with SMTP id r14csp744771lkv;\n        Fri, 3 Dec 2021 04:42:53 -0800 (PST)\nX-Smtp-Source: ABdhPJymCb004FLGPspwKq6mQcFTkQI+6hALJlF13shK+ntIeX8h0SWQqagautTMbkvoX79lLC3q\nX-Received: by 2002:a17:907:3d9e:: with SMTP id he30mr23373073ejc.177.1638535373767;\n        Fri, 03 Dec 2021 04:42:53 -0800 (PST)\nARC-Seal: i=2; a=rsa-sha256; t=1638535373; cv=pass;\n        d=example.com; s=arc-20160816;\n        b=USf28Lo91ATsFeazTbuMiNCB3WbhZEw8x9MTs4+siqtT2zySPFK6O4ToLy5cbp8WCO\n         4D4dXOj8DPnh/NIlPCMw3l5Id9cPWIQmKsIUT9/8uw95dJ4FFprAC5Jkl8aw4m9YRs/V\n         ys1g05LSCEgA7YLqTWSIJ2Ppm7W4QoZLQ8ceVPnphji9ebfScow6P4/JCz62Zvi8ctEp\n         fESH67rTdvicZdOLNUt4XtfZrgX91SFbQdJv3aJPvxqb41IJNBbTjNzK81sbcRSzkR4n\n         +gMEt+JWRR8yqJwzsdYZoBgucuF3t/02X0wiz6+EPfdgZfXxhfppc+e43eMFq6uJvPzz\n         TFCQ==\nARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=example.com; s=arc-20160816;\n        h=date:message-id:to:from:subject:mime-version:dkim-signature;\n        bh=j4xQykyD/StFXkM8uzUR3OggqUVrMAzwjuBrPmXIV+Q=;\n        b=QqaNZkGYgwYl8fWD+cgJj0kpQ0m+LufCCh9qEl+uEUQZQp1JNqxfdvqj96Hpi/oFAG\n         qUgY/Vn8jaw2E6opLDayuBXdLPrU7Rwe/r/+YwNlG5nKWlFAN7QBKf3FpJn/p58p8GTZ\n         mjGc5bi409SWtqYbt2Y5GrmbtGo88IrBUFBT/7WtcBoUXJplx/89w8RtHyiYVCzVU47J\n         xUYgMdcSLkEny1Nt7k6fXjACCWKABQPRunF8gRITbpTiCfGqdeVPbbVaH0ZY9khgI8QS\n         TICsIuB7nLRd/I4jKbhs38/QVP5Kc2NyGjRT8/kWHTRoHCLrmy0eLIitPBYaJzydJ31l\n         /T9g==\nARC-Authentication-Results: i=2; mx.example.com;\n       dkim=pass header.i=@example.com header.s=zoho header.b=DXQqHBHo;\n       arc=pass (i=1 spf=pass spfdomain=example.com dkim=pass dkdomain=example.com dmarc=pass fromdomain=example.com>);\n       spf=pass (example.com: domain of customer@example.com designates 31.186.226.225 as permitted sender) smtp.mailfrom=customer@example.com\nReturn-Path: <customer@example.com>\nReceived: from sender11-op-o11.zoho.eu (sender11-op-o11.zoho.eu. [31.186.226.225])\n        by mx.example.com with ESMTPS id h15si5379724ede.136.2021.12.03.04.42.53\n        for <company@example.com>\n        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n        Fri, 03 Dec 2021 04:42:53 -0800 (PST)\nReceived-SPF: pass (example.com: domain of customer@example.com designates 31.186.226.225 as permitted sender) client-ip=31.186.226.225;\nAuthentication-Results: mx.example.com;\n       dkim=pass header.i=@example.com header.s=zoho header.b=DXQqHBHo;\n       arc=pass (i=1 spf=pass spfdomain=example.com dkim=pass dkdomain=example.com dmarc=pass fromdomain=example.com>);\n       spf=pass (example.com: domain of customer@example.com designates 31.186.226.225 as permitted sender) smtp.mailfrom=customer@example.com\nARC-Seal: i=1; a=rsa-sha256; t=1638535373; cv=none; \n\td=zohomail.eu; s=zohoarc; \n\tb=g14LDqq+UFBerNOKaKqcDxFLTLU3WdJgmJiBt4rSR5b1kj940eMOwkavflRdJztgi+Exc3pbDleEcPnS4W+bb7vDsJsVPnU1yOx5UsgQI1Nq+StNMNgV0bGaVegRW0G79YWOeJuGC0+YOz68cLOU7fAexOF9s1rxmuHKbqfZQVI=\nARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.eu; s=zohoarc; \n\tt=1638535373; h=Content-Type:Date:From:MIME-Version:Message-ID:Subject:To; \n\tbh=j4xQykyD/StFXkM8uzUR3OggqUVrMAzwjuBrPmXIV+Q=; \n\tb=jen7TS3hQnIbAHgUP+iqROwzOwteVov4mcNDJWln0tGSPQPa+gShAcR3hYW3kAimwzOKpHeSpm20COVg54fcgM6KRcB2AAz0mvH4j7E6KZbz7hR7JNzTqrd8KJ5ID6rfJJGGFNEsnoXysHL7TRlC7+QJceyG+IPmcbSnDhUA7l4=\nARC-Authentication-Results: i=1; mx.zohomail.eu;\n\tdkim=pass  header.i=example.com;\n\tspf=pass  smtp.mailfrom=customer@example.com;\n\tdmarc=pass header.from=<customer@example.com>\nDKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1638535373;\n\ts=zoho; d=example.com; i=customer@example.com;\n\th=Content-Type:MIME-Version:Subject:From:To:Message-ID:Date;\n\tbh=j4xQykyD/StFXkM8uzUR3OggqUVrMAzwjuBrPmXIV+Q=;\n\tb=DXQqHBHovFQJQDtWcmjSPIm/j6SEgTMuJ5ber2toXh87MCLo2YOczMN95BmrPndO\n\t4fq5Kf8spohIrmlVtN/Lhhut9+WmvsQO2OUkNyEl6aXL+dMbLxUzhJ9y3YRX93Jv6WB\n\t1KTlh6hECpM/CHRZF2jr100/rjAtOTrjuVkLnRxo=\nReceived: from [127.0.1.1] (vsb47.miramo.cz [93.95.33.47]) by mx.zoho.eu\n\twith SMTPS id 1638535371878118.3658187522883; Fri, 3 Dec 2021 13:42:51 +0100 (CET)\nContent-Type: multipart/alternative; boundary="===============1926786547036323748=="\nMIME-Version: 1.0\nSubject: Stuff\nFrom: customer@example.com\nTo: company@example.com\nMessage-ID: <17d80519c68.584f50b21056913974.6286387870772540920@zoho.eu>\nDate: Fri, 3 Dec 2021 13:42:51 +0100 (CET)\nX-ZohoMailClient: External\n\n--===============1926786547036323748==\nContent-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\n\nHello Joe\nThis is a receipt from ACME incorporated\n\nDate: 10/21/2021\n\nTotal: $128.53\n\nPayment Method\n------------------------------------------------------\nCredit Card\n\nVisa 7139\n\n\nThanks for your purchase Joe\n\nACME inc.\n\n--===============1926786547036323748==--\n"
parsed = RFC2822.parse(eml)
Mail.get_text(parsed)

Steps to reproduce

Email was created in a python script. This is nothing unusual about this. It can be done by any online service using python in a backend system. Parsing this email results in lost newline characters. I've tried to send this email with \r\n and \n newline characters but it didn't work in both cases. Since I've been using Python standard library my guess is your library has an issue.

Email content has been downloaded from gmail and thunderbird interfce. Both times it caused the same error. Email is showing correctly in gmail, thunderbird and zoho web interface.

Expected Behavior

%Mail.Message{
  body: <String with newline(\n) characters>
 ...
}

Actual Behavior

%Mail.Message{
  body: <String without newline(\n) characters>
 ...
}
bcardarella commented 2 years ago

@martin-jahn are you able to PR a failing test case? One of the DY engineers just tried to reproduce and could not.

martin-jahn commented 2 years ago

Sure, I'll write that later today.

andrewtimberlake commented 2 years ago

@martin-jahn #139 passes against the master branch. It is not a failing test case?

martin-jahn commented 2 years ago

After I've made another commit now it passes. Although test decode removes <CR><LF> pairs (Mail.Encoders.SevenBitTest) is failing. I don't understand why it works that way so I'd be glad to know what kind of reasoning is behind removing newline characters from messages. Because that test to me looks like it expects exact opposite to what the code is supposed to do. I know emails are weird so please don't take this as an attack to this project in any way.

bcardarella commented 2 years ago

@martin-jahn are you on Windows or using emails generated from Windows?

martin-jahn commented 2 years ago

I'm not on Windows it's on Ubuntu 21.04. I've tested Elixir 1.10-1.13 and Erlang 22-24. Every time there's the same result. You literally have that in your tests as expected behaviour. This is a snippet from tests in this project. I can't even imagine why this would be a good idea. Because I would expect the result to be "This is a \ntest\n".

  test "decode removes <CR><LF> pairs" do
    message = "This is a \r\ntest\r\n"
    assert Mail.Encoders.SevenBit.decode(message) == "This is a test"
  end

Just for clarification I've downloaded the email from straight from gmail and Thunderbird. As every old enough internet standard email as well uses \r\n for new lines. The file in the PR contains those as well. That email was generated in Python 3, sent to Zoho and delivered to gmail. Nowhere along the way was there any problem with how that email was displayed. I also have problem with email sent by a customer from Outlook in production. But that email can't be shared for obvious reasons.

Just to make sure you know, I've modified #139 so that my test passes but it breaks test in the snippet above.

martin-jahn commented 2 years ago

I'm sorry I forgot to attach Python script which could be used to generate your own emails. Please run it under Python 3. I've used 3.9.5 but my guess would be that even 3.5 should work just as good.

import smtplib

from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

# me == my email address
# you == recipient's email address
sender = "customer@example.com"
recipient = "company@example.org"

# Create message container - the correct MIME type is multipart/alternative.
msg = MIMEMultipart('alternative')
msg['Subject'] = "Stuff"
msg['From'] = sender
msg['To'] = recipient

# Create the body of the message (a plain-text and an HTML version).
text = "Hello Joe\nThis is a receipt from ACME incorporated\n\nDate: 10/21/2021\n\nTotal: $128.53\n\nPayment Method\n------------------------------------------------------\nCredit Card\n\nVisa 7139\n\n\nThanks for your purchase Joe\n\nACME inc.\n"

part1 = MIMEText(text, 'plain')

msg.attach(part1)
# Send the message via local SMTP server.
mail = smtplib.SMTP_SSL('smtppro.zoho.eu', 465)

mail.ehlo()

mail.login('customer@example.com', 'secret password')
mail.sendmail(sender, recipient, msg.as_string())
mail.quit()
andrewtimberlake commented 2 years ago

@martin-jahn RFC 2045 §2.7 explicitly states that \r\n is used to separate longer lines, i.e. in decoding, they are removed and the longer lines joined together. The code that removes the \r\n is explicit and can be seen here: https://github.com/DockYard/elixir-mail/blob/fa9ddf50ddc94cbb19e581873d4d7991ea5fdac0/lib/mail/encoders/seven_bit.ex#L49 I don’t believe there is a problem with how we are handling this situation.

bcardarella commented 2 years ago

@martin-jahn considering @andrewtimberlake's comment I'll close for now. If you feel this is still an issue please let us know and I can reopen to discuss