DockYard / elixir-mail

Build composable mail messages
388 stars 61 forks source link

7bit encoded email with line break #151

Open htulipe opened 1 year ago

htulipe commented 1 year ago

Hello. Maybe a dumb question but I can't see how the lib can successfully parse a 7bit encoded email that contain line breaks. By successfully I mean without losing line breaks.

Version

mail 0.2.3 Erlang/OTP 24 [erts-12.3.2.6] [source] [64-bit] [smp:5:5] [ds:5:5:10] [async-threads:1] Elixir 1.14.0 (compiled with Erlang/OTP 24)

Test Case

Using the parse_email function defined in parser test:

parse_email("""
    To: user@example.com
    From: me@example.com
    Subject: Test Email
    Content-Transfer-Encoding: 7bit

    This is the body!
    It has more than one line
    """)

Steps to reproduce

Run the above code

Expected Behavior

The returned body should have some sort of line breaks: This is the body!\nIt has more than one line

Actual Behavior

The returned body no longer have line breaks: This is the body!It has more than one line

Reading at the code, I see that the lib joins the body lines using \r\n but the SevenBit parser called just after drops them. Am I missing something ?

Joining the body lines with \n instead of \r\n seems to fix the issue.

Thanks in advance

PS: I saw the previous issue on the matter but could not find an answer there so I allowed myself to repost a new issue.

bcardarella commented 1 year ago

Is @andrewtimberlake 's answer in #138 not sufficient?

htulipe commented 1 year ago

I agree with Andrew's RFC understanding but the direct conclusion is that we can't send multi-line emails with this encoding. That can't be possible, I must be missing something.

May I add that python email module parses the same email without loosing line breaks.

bcardarella commented 1 year ago

@htulipe so your issue is not with the parsing but the compilation from the data structure into an email?

htulipe commented 1 year ago

My goal is to read an EML file and transform it in some data structure that my frontend end can then display.

SergeyMosin commented 1 month ago

Is @andrewtimberlake 's answer in #138 not sufficient? ( https://github.com/DockYard/elixir-mail/issues/138#issuecomment-1103530953 )

First of all, thank you for your work on this module. However, I have the following question...

What should be the expected parsed message body for the following code according to RFC 2045 §2.7 ?

IO.inspect(
      Mail.parse([
        "From: a@b.tld",
        "To: c@d.tld",
        "Subject: test",
        "Content-Transfer-Encoding: 7bit", # or 8bit
        "",
        "line1",
        "line2"
      ])
    )

Option A: line1\r\nline2

  1. :heavy_check_mark: Data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences.
  2. :heavy_check_mark: No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
  3. :heavy_check_mark: CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

Option B: line1line2

  1. :x: Data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences.
  2. :heavy_check_mark: No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
  3. :x: CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

I personally lean towards Option A, but the Mail.parse function currently outputs Option B which seems to diverge from the RFC in points 1 and 3 because the "CRLF line separation sequence" is missing.

andrewtimberlake commented 1 month ago

I subsequently found out that 7bit decoding was removing line breaks indiscriminately and should only be removing those used to wrap lines exceeding the maximum length of 1000 chars I have merged in a fix #164

SergeyMosin commented 4 weeks ago

Thank you for the quick fix. I think the same problem effects the 8bit encoding as well. Example:

IO.inspect(
  Mail.parse([
    "From: a@b.tld",
    "To: c@d.tld",
    "Subject: test",
    "Content-Type: text/plain; charset=UTF-8",
    "Content-Transfer-Encoding: 8bit",
    "",
    "lïne1",
    "lïne2"
  ])
)

outputs this:

%Mail.Message{
  headers: %{
    "content-transfer-encoding" => "8bit",
    "content-type" => ["text/plain", {"charset", "UTF-8"}],
    "from" => "a@b.tld",
    "subject" => "test",
    "to" => ["c@d.tld"]
  },
  body: "lïne1lïne2",
  parts: [],
  multipart: false
}

no \r\n in the body

andrewtimberlake commented 4 weeks ago

Thanks, great catch. Fixed in #166