golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124k stars 17.67k forks source link

net/mail: AddressList doesn't decode rfc2047 encoded words inside quotes #23140

Open sfilargi opened 6 years ago

sfilargi commented 6 years ago

net/mail AddressList doesn't decode rfc2047 encoded words if they are inside quotes.

The RFC mentions "An 'encoded-word' MUST NOT appear within a 'quoted-string'."

Now before you close this bug, saying it is working as intended, let me try to convince you otherwise.

A lot of clients break the rule above, and most services/libraries are programmed to work around it.

For example Gmail will happily decode the string even if it is inside quotes.

The way I see it, there are two paths we can follow:

  1. Stick to RFC
  2. Try to be compatible with the majority of the libraries/services out there.

We can be strict and choose 1, in which case the library will not be of much use, since users of the software will complain.

Or we can be pragmatic and choose 2. There is not much risk in decoding Q encoded words inside quotes.

I hope you choose (2).

Repro:

https://play.golang.org/p/etkJkTfs3Q

What version of Go are you using (go version)?

1.9

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

darwin/amd64

What did you do?

https://play.golang.org/p/etkJkTfs3Q

What did you expect to see?

Decoded name

What did you see instead?

Undecoded name

minaevmike commented 6 years ago

Thunderbird does the same as gmail.

gopherbot commented 6 years ago

Change https://golang.org/cl/139177 mentions this issue: net/mail: Decode RFC 2047 encoded strings within quotes.

RalphCorderoy commented 6 years ago

Gmail deciding to consciously (hopefully) violate the RFC for input isn't as important as whether Gmail, being a major player in sending emails, produces corrupt output that violates the RFC. I'm assuming not?

What are the producers of the corrupt emails? If it were a major producer of emails, e.g. a 'MailChimp', or an open-source library, then it can probably be persuaded to fix things for future emails. (I know of several successes in this area for mail RFC violations.) Some 'spam assassins' use RFC violations as one measure; it catches PHP scripters but by letting a kosher mail producer off the hook, this signal is being weakened.

Continuing to stick with the RFC doesn't stop Go processing the email, e.g. receiving and sending, so I don't think Go violating the RFC, and encouraging others to do so, is a good idea. Postel might have been right then, but would be wrong in the modern era. https://en.wikipedia.org/wiki/Postel%27s_law#Criticism has references, including https://tools.ietf.org/html/draft-thomson-postel-was-wrong-02

sfilargi commented 6 years ago

Some 'spam assassins' use RFC violations as one measure; it catches PHP scripters but by letting a kosher mail producer off the hook, this signal is being weakened.

It doesn't matter. The suggestion is not to make Go send emails that break RFC, but to correctly parse them when it receives them, even if they break the RFC.

Continuing to stick with the RFC doesn't stop Go processing the email, e.g. receiving and sending, so I don't think Go violating the RFC, and encouraging others to do so, is a good idea.

No, it doesn't stop sending or receiving, but it looks terrible for the end user, so this library will be useless for those cases where there is end-user interaction. Not fixing it in Go is kind of conformism, since pretty much every other MTA out there breaks this rule.

RalphCorderoy commented 6 years ago

Some 'spam assassins' use RFC violations as one measure; it catches PHP scripters but by letting a kosher mail producer off the hook, this signal is being weakened.

It doesn't matter. The suggestion is not to make Go send emails that break RFC, but to correctly parse them when it receives them, even if they break the RFC.

Go would not be 'correctly parsing them'. It does that now. Go would be weakening the signal for spam detectors by no longer discouraging buggy email producers.

No, it doesn't stop sending or receiving, but it looks terrible for the end user, so this library will be useless for those cases where there is end-user interaction. Not fixing it in Go is kind of conformism, since pretty much every other MTA out there breaks this rule.

But this Go is not an MTA, it is a stdlib. Working around buggy emails is a policy decision, not a mechanism one, and for the MTA written in Go to make, and implement, not the stdlib to do for all callers. (And as a user of that MTA, I'd want it off by default so I get to see the true email that's been sent.)

dmitshur commented 5 years ago

As a data point, I got a notification mail from Gerrit with the following From header:

Subject: [go] time: fix parse month error message
From: "=?UTF-8?Q?=E7=B4=98=E5=A3=AB_=E5=85=AB=E5=B7=BB_=28Gerrit=29?=" <noreply-gerritcodereview-abcdef123456@google.com>

I'm not sure if that's violating the RFC, but if so, I plan to report it to them.

Removing the quotes seems to make it parse with net/mail:

https://play.golang.org/p/0j4_QXh0EeK

stavrospen commented 5 years ago

As a data point, I got a notification mail from Gerrit with the following From header:

Subject: [go] time: fix parse month error message
From: "=?UTF-8?Q?=E7=B4=98=E5=A3=AB_=E5=85=AB=E5=B7=BB_=28Gerrit=29?=" <noreply-gerritcodereview-abcdef123456@google.com>

I'm not sure if that's violating the RFC, but if so, I plan to report it to them.

Removing the quotes seems to make it parse with net/mail:

https://play.golang.org/p/0j4_QXh0EeK

They may be violating the RFC, but there are so many clients out there that do it, that you cannot just take a hard stance if you are building a product for end users.

At the end of the day our end-users will see that OUR product is not working, while other products have no problem.

Go is a pragmatic language and I was hoping the pragmatic approach would have been followed here.

But this is just academic for me at the moment, as I switched to another language because of this.

dmitshur commented 5 years ago

One approach may be to have two packages. The RFC can be followed strictly in the standard library package, but another package can implement more lax parsing outside of the standard library.

That way, users who are looking for strict RFC behavior can continue to use net/mail, but those interested in building an email product for end users can implement custom behavior for their needs.

RalphCorderoy commented 5 years ago

Hi @dmitshur, Yes, that header in the Gerrit email is faulty so please do report it to the creator, probably Gerrit or a library they use. It's https://tools.ietf.org/html/rfc2047#section-5 that says An 'encoded-word' MUST NOT appear within a 'quoted-string'., their shouting, not mine. :-)

dmitshur commented 5 years ago

I've reported it to Gerrit at https://bugs.chromium.org/p/gerrit/issues/detail?id=10519.