gen-smtp / gen_smtp

The extensible Erlang SMTP client and server library.
Other
683 stars 265 forks source link

TLS + SMTP Error when using OTP/26 #785 #328

Open fekle opened 1 year ago

fekle commented 1 year ago

Describe the bug

Hi there - I already opened this issue in https://github.com/swoosh/swoosh/issues/785 but it looks like it is related to gen_smtp, so I'll open it here as well.

I recently updated my Projects to Elixir 1.15 and OTP/26, and everything but sending emails via swoosh and gen_smtp works fine. I am sending E-Mails via SMTP, using a local business mail provider as a relay.

Swoosh tries sending the email multiple times via gen_smtp and finally fails with the following error:

{:error,
 {:retries_exceeded, {:temporary_failure, ~c"XXX.XXX.XXX.XXX", :tls_failed}}}

Here is my configuration:

  config :redacted, REDACTED.Mailer,
    adapter: Swoosh.Adapters.SMTP,
    relay: System.get_env("SMTP_HOST") || raise("missing SMTP_HOST"),
    username: System.get_env("SMTP_USERNAME") || raise("missing SMTP_USERNAME"),
    password: System.get_env("SMTP_PASSWORD") || raise("missing SMTP_PASSWORD"),
    tls: :always,
    ssl: false,
    auth: :always,
    port: 587,
    retries: 15

The provider only supports STARTTLS via port 587, so that's why ssl is set to false.

With Elixir 1.15 on OTP/25 it worked fine, with OTP/26 the issue appears.

I checked the Erlang documentation, but there don't seem to be any breaking changes related to TLS apart from SSL certificate validation being on by default, but that should be fine? I know this is not much to go on, but unfortunately swoosh does not give me any more information apart from :tls_failed which makes this hard to debug.

Steps to Reproduce the Bug or Issue

Send an email via Swoosh's SMTP adapter and the aforementioned config with OTP/26.

Expected behavior

The email sends without issues.

Your reproducible example

No response

Screenshots or Videos

No response

Platform

Additional context

No response

seriyps commented 1 year ago

Well, the TLS certificate validation could easily be the root cause. Wihout looking into details, may you try to add following options to gen_smtp client?

[{tls_options, [{verify, verify_none}]}]
fekle commented 1 year ago

@seriyps Thanks, that worked - now the emails send again. Maybe gen_smtp could report verify issues explicitly?

Also do you have an idea why that happens, given that the SMTP server's certificate is valid? "Normal" email clients like Outlook or Apple Mail have no issues, and openssl also says the certificate is valid.

> openssl s_client -connect REDACTED:587 -starttls smtp                                                                        16:45:52
CONNECTED(00000005)
[...]
---
Server certificate
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
subject=CN = REDACTED ( same as in the config )
[...]
---
No client certificate CA names sent
Peer signing digest: SHA512
Peer signature type: RSA
Server Temp Key: ECDH, prime256v1, 256 bits
---
SSL handshake has read 5939 bytes and written 473 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 4096 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
[...]

Or am I misunderstanding something here?

seriyps commented 1 year ago

I think this document describes this topic quite well https://erlef.github.io/security-wg/secure_coding_and_deployment_hardening/ssl

In short, even when SMTP server has a certificate that is signed by a trusted issuer, you should tell Erlang's ssl which set of root certificates it should consider as trusted. In OTP-25 public_key:cacerts_get() function was added which detects the set of trusted root certificates from OS, which is a good default choice, but you still have to provide it explicitly.

fekle commented 1 year ago

Thanks for the link, I checked it out - however trying public_key:cacerts_get(), :ceritify and CaStore modules all take me right back to the original error - :tls_failed. Is there a way to find out what exactly about tls is failing here?

For more detail:

Adding the cacerts or cacertfile (in the case of using :certifi) option causes it to log the following notice:

[notice] TLS :client: In state :certify at ssl_handshake.erl:2140 generated CLIENT ALERT: Fatal - Handshake Failure
 - {:bad_cert, :hostname_check_failed}

Now that's something I am familiar with, and in the past i used the :ssl_verify_fun module to help with this by adding verify_fun: &:ssl_verify_hostname.verify_fun/3 to the options. This removes the :hostname_check_failed error, taking me right back to just :tls_failed with no other information. I'm feeling a bit lost here, so any help in finding the underlying issue would be greatly appreciated :)

PS: this issue occurs on both macOS 13 and Debian 11. I also tried sending via gmail (smtp.gmail.com) and get the same issues - so I doubt it has anything to do with the smtp server I'm using.

seriyps commented 1 year ago

Sorry, I don't have a good tips for you. gen_smtp tends to swallow some of the error details, because it tries to connect to each SMTP server returned by MX DNS lookup and only returns an error when there is no more servers left.

You can try to enable debug logging for OTP ssl app, but I don't remember all the options you should tweak to do that... I think smth like

logger:set_primary_config(level, debug).
ssl:connect("google.com", 443, [{log_level, debug}]).
fekle commented 1 year ago

No worries, thanks anyways for your quick replies! I tried the setting the log level or using ssl:connect, but the log level doesn't change anything ( I already have the global log level set to debug) and ssl:connect doesnt show the same errors.

I concede that for now STARTTLS doesn't seem to work with OTP/26 unless you set verify: :verify_none in the tls_options (granted, that was the OTP/25 default). Of course validating the server would be a good security best practice, so maybe someone more experienced with OTP and its TLS implementation can find a solution here :-)

seriyps commented 1 year ago

granted, that was the OTP/25 default

nope, it was just non-fatal error prior to OTP-26

fekle commented 1 year ago

I did some more digging in the documentation and created a quick reproduction script. It turns out that when enabling SNI with the same domain as the relay, it works! For example:

Mix.install([
  {:gen_smtp, "~> 1.2"},
  {:certifi, "~> 2.11"},
  {:eiconv, "~> 1.0"}
])

require Logger

:application.ensure_all_started(:ssl)

Logger.configure(level: :debug)
:logger.set_application_level(:ssl, :debug)

Logger.debug("Starting SMTP TLS test")

{:ok, pid} =
  :gen_smtp_client.send(
    {"test@test.test", ["test@test.test"],
     "Subject: testing\r\nFrom: Test <test@test.test>\r\nTo: Test <test@test.test>\r\n\r\nThis is the email body"},
    relay: "smtp.gmail.com",
    retries: 0,
    tls: :always,
    ssl: false,
    auth: :always,
    port: 587,
    username: "test@test.test",
    password: "test",
    tls_options: [
      verify: :verify_peer,
      cacerts: :certifi.cacerts(), # <--- this is required!
      server_name_indication: ~c"smtp.gmail.com", # <--- this is required!
      depth: 99,
      log_level: :debug
    ]
  )

Process.monitor(pid)

receive do
  _ ->
    :ok
end

Now returns ** (EXIT from #PID<0.98.0>) {:error, :no_more_hosts, {:permanent_failure, ~c"142.250.147.109", :auth_failed}} as expected - using correct credentials now works.

The important parts are:

      cacerts: :certifi.cacerts(), # <--- this is required!
      server_name_indication: ~c"smtp.gmail.com", # <--- this is required!

I'm using certifi here to be independent from the system ca certs. I know my example code is elixir, but the same should apply for erlang.

@seriyps what do you think - maybe we should use those options per default, or otherwise maybe mention this in the documentation? I'd be happy to open a PR for this!

seriyps commented 1 year ago

I think mentioning it in the docs should be the solution - pre-selecting the list of trusted root certificates might be not very secure plus only works without dependencies on OTP-25+

@mworrell WDYT?

mworrell commented 1 year ago

I am actually ok with a default using the list from tls_certificate_check:options/1 if there are no other ssl options overruling the ssl options.

A lot of projects started to include this app, and it is quite good maintained. If we have a loose dependency, like:

{tls_certificate_check, "~> 1.18"},

Then the project using gen_smtp can always use a newer version.

Example of the usage:

(zotonic@PoToi)17> tls_certificate_check:options("smtp.gmail.com").
[{verify,verify_peer},
 {depth,100},
 {cacerts,[<<48,130,3,117,48,130,2,93,160,3,2,1,2,2,11,4,0,
             0,0,0,1,21,75,...>>,
           <<48,130,4,42,48,130,3,18,160,3,2,1,2,2,4,56,99,222,248,
             48,13,6,...>>,
           <<48,130,3,119,48,130,2,95,160,3,2,1,2,2,4,2,0,0,185,48,
             13,...>>,
           <<48,130,4,145,48,130,3,121,160,3,2,1,2,2,4,69,107,80,84,
             48,...>>,
           <<48,130,4,50,48,130,3,26,160,3,2,1,2,2,1,1,48,13,6,...>>,
           <<48,130,5,183,48,130,3,159,160,3,2,1,2,2,2,5,9,48,...>>,
           <<48,130,6,157,48,130,4,133,160,3,2,1,2,2,2,5,198,...>>,
           <<48,130,3,90,48,130,2,66,160,3,2,1,2,2,1,0,...>>,
           <<48,130,4,48,48,130,3,24,160,3,2,1,2,2,16,...>>,
           <<48,130,4,0,48,130,2,232,160,3,2,1,2,2,...>>,
           <<48,130,4,15,48,130,2,247,160,3,2,1,2,...>>,
           <<48,130,3,183,48,130,2,159,160,3,2,1,...>>,
           <<48,130,3,175,48,130,2,151,160,3,2,...>>,
           <<48,130,3,197,48,130,2,173,160,3,...>>,
           <<48,130,5,186,48,130,3,162,160,...>>,
           <<48,130,5,189,48,130,3,165,...>>,
           <<48,130,3,184,48,130,2,...>>,
           <<48,130,3,188,48,130,...>>,
           <<48,130,4,29,48,...>>,
           <<48,130,2,137,...>>,
           <<48,130,3,...>>,
           <<48,130,...>>,
           <<"0"...>>,<<...>>|...]},
 {verify_fun,{fun ssl_verify_hostname:verify_fun/3,
              [{check_hostname,"smtp.gmail.com"}]}},
 {partial_chain,fun tls_certificate_check_shared_state:find_trusted_authority/1},
 {customize_hostname_check,[{match_fun,#Fun<public_key.6.103305546>}]},
 {server_name_indication,"smtp.gmail.com"}]
fekle commented 1 year ago

I agree with both points - including this by default would make it easier for most people to use, however I also understand that, as gen_smtp delegates TLS/SSL to the ssl module, it is not really the duty of gen_smtp to choose which CA Certificates should be used. (If you ask me, erlang's ssl should use the system CA certificates by default unless otherwise specified, as that is how most software does it and I suspect this would be the behaviour that most developers expect - but that is a different discussion unrelated to gen_smtp :-) ).

I do however think that gen_smtp should set the SNI option to the domain specified in the relay option, as otherwise the server may not know which certificate to choose (as gen_smtp looks up the IP via DNS and passes the IP to ssl as i understand) - this would make it so that certificate and host validation succeeds once the user specifies which certificates to use. Or at least document that developers need to set the server_name_indication in addition to the relay option for TLS/SSL to work with certificate validation.

gmile commented 1 year ago

I did some more digging in the documentation and created a quick reproduction script.

@fekle thanks for posting a workaround with explicitly specified options!

It didn't work for me initially, as I decided not to specify the depth (since your comment didn't mention is as critical).

Before specifying the depth parameter, I get a few of the notices below, then a failure as a result:

15:54:17.877 [notice] TLS :client: In state :certify at ssl_handshake.erl:2140 generated CLIENT ALERT: Fatal - Handshake Failure
 - {:bad_cert, :max_path_length_reached}

# ...

{:error, :retries_exceeded, {:temporary_failure, '52.57.139.126', :tls_failed}}

After specifying the depth parameter, all worked as expected, and a "provider queue" string was returned:

"Ok: queued as mgmql-u1Q8aRoiwOkOYDmA\r\n"

My final code also doesn't depend on :certifi (to/from emails are changed):

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"server@example.com", ["client@example.com"],
   "Subject: Emails are working\r\nFrom: Eugene's App <server@example.com>\r\nTo: Eugene <client@example.com>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: "smtp.sendgrid.net",
  port: 587,
  tls: :always,
  username: "apikey",
  password: "SG.YDst...",
  tls_options: [
    verify: :verify_peer,
    cacerts: :public_key.cacerts_get(),
    server_name_indication: 'smtp.sendgrid.net',
    depth: 99
  ]
)
tosie commented 1 year ago

Having spent some time with this I stumbled upon an interesting problem. We are using gen_smtp to send emails via Microsoft365's SMTP server. According to their documentation the server is available at smtp.office365.com. Using this value as relay works, but host verification still fails even when server_name_indication is set.

The server's certificate specifies *.office365.com in its alternative names field, but this seems to not be accepted by the host name verification function ({:bad_cert, : hostname_check_failed).

Setting SNI (server_name_indication) to "office365.com" (another one of the alternative name entries) solves the problem.

Here is the full example for Microsoft365:

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"server@example.com", ["client@example.com"],
   "Subject: Emails are working\r\nFrom: Eugene's App <server@example.com>\r\nTo: Eugene <client@example.com>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: "smtp.office365.com",
  port: 587,
  auth: :always,
  ssl: false,
  tls: :always,
  username: "...",
  password: "...",
  tls_options: [
    verify: :verify_peer,
    versions: ["tlsv1.3"],
    cacerts: public_key:cacerts_get(),
    server_name_indication: "office365.com",
    depth: 99
  ]
)
fekle commented 1 year ago

@gmile you're welcome! :-) I didn't mark depth, as it wasn't required for the server I used.

I ended up using what @mworrell showed, that works great with all checks:

[...]
 tls_options: :tls_certificate_check.options(System.get_env("SMTP_HOST"))
[...]

Before we close this issue - should we now change the defaults in gen_smtp or just mention it in the documentation, and leave it up to upstream projects like swoosh to handle the issue with thinks like :tls_certificate_check.options ?

tosie commented 1 year ago

Very nice, did not know about :tls_certificate_check 👍👍

Since OTP changed its defaults maybe it would be a good idea to change the defaults of gen_smtp to something sensible that works and it safe to use, too 🤷‍♂️

neilberkman commented 1 year ago

In case anyone needs to get this working for AWS SES, this seems to be the minimal config...

    smtp_relay = System.get_env("AWS_SMTP_RELAY") # e.g. "email-smtp.us-east-1.amazonaws.com"
    config = [
      relay: smtp_relay,
      port: 587,
      username: System.get_env("AWS_SMTP_USERNAME"),
      password: System.get_env("AWS_SMTP_PASSWORD"),
      tls: :always,
      tls_options: :tls_certificate_check.options(smtp_relay)
    ]
aeros281 commented 7 months ago

I'm using bamboo_smtp (with patched logic to forward tls_options to gen-smtp) right now to send email via AWS SES. Even with tls_options set to :tls_certificate_check.options(smtp_relay) the issue is not fixed. Debug logs were not useful at all. I resolved it with:

tls_options: [
    ...
    versions: ["tlsv1.2"],
    server_name_indication: smtp_server |> to_charlist(),
    ...
 ]

The SES endpoint only support TLS v1.2, I think.

linges commented 7 months ago

AWS SES worked for me like this in elixir:

...
  {:tls_options, [
           versions: [:"tlsv1.2"]
         ] ++ :tls_certificate_check.options(server)},
...
gmile commented 6 months ago

Had to come back to this once more while upgrading an older app. My previous answer didn't immediately work anymore.

ℹ️ It did worked after changing exactly two things:

  1. changed value of SMTP relay from an IP to using a hostname instead,
  2. added customize_hostname_check to tls_options

The final code looks like this:

Mix.install([:gen_smtp])

:gen_smtp_client.send_blocking(
  {"server@example.com", ["client@example.com"],
   "Subject: Emails are working\r\nFrom: Eugene's App <server@example.com>\r\nTo: Eugene <client@example.com>\r\n\r\nLooks like emails are working"},
  auth: :always,
  relay: System.get_env("SMTP_RELAY"),
  port: 587,
  tls: :always,
  username: System.get_env("SMTP_USERNAME"),
  password: System.get_env("SMTP_PASSWORD"),
  tls_options: [
    verify: :verify_peer,
    cacerts: :public_key.cacerts_get(),
    depth: 99,
    server_name_indication: ~c"#{System.get_env("SMTP_RELAY")}",
    customize_hostname_check: [
      match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
    ]
  ]
)

As a side note, this code doesn't rely on tls_certificate_check. In the app code am configuring the :gen_smtp_client using config/runtime.exs, and calling :tls_certificate_check.options/1 there was failing for me with some error coming from the depths of :tls_certificate_check sources. I think it could be due to tls_certificate_check app not starting when config/runtime.exs is evaluated, though I am not sure.

On another separate note, this older app I was upgrading is using a regional, lesser known SMTP provider, so until I realized I needed to change from using IP to using a hostname, I was getting the :retries_exceeded error from OP's message and started to doubt if the provider even supports TLS cert verification. This tool proved helpful to dispel the doubts I had about the server's support of TLS: https://www.checktls.com/TestReceiver