golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.06k stars 17.68k forks source link

crypto/tls: interoperability problems between go tls server and microsoft/outlook.com tls (smtp starttls) client #70232

Open mjl- opened 1 week ago

mjl- commented 1 week ago

Go version

go1.23.2 linux/amd64

Output of go env in your module/workspace:

n/a

What did you do?

Deploy mox, a mail server, and successfully get incoming email message deliveries from microsoft (outlook.com, both office365 and personal/free accounts) to mox over SMTP with STARTTLS (crypto/tls server).

What did you see happen?

On October 24 I started receiving "TLS reporting" errors with "validation failure" error in the "sts" (MTA-STS) section. Up to and including October 23 I received TLS reports with only successful delivery attempts. I investigated, but couldn't find anything wrong. Yesterday I learned message deliveries from microsoft (outlook.com servers) to mox were failing. The TLS reporting error message wasn't precise/clear, but there's a good chance it was about these failing deliveries attempts.

The symptoms: I would see an incoming smtp connection, the "starttls" command, and an abrupt close of the connection by remote. Debugging revealed the connection was closed by remote after reading the server-side response the the TLS client hello message, without the remote writing anything in response (EOF while trying to read the first bytes looking for the "client finished" message). During more debugging, I noticed the Go TLS server code sends a session ticket message as part of its response to the client hello message. Setting tls.Config.SessionTicketsDisabled = true prevents the new session ticket from being sent, and makes the Microsoft SMTP STARTTLS command, and delivery of messages, succeed.

At https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 I noticed:

At any time after the server has received the client Finished message, it MAY send a NewSessionTicket message.

One theory: The Go TLS server is sending the NewSessionTicket message too soon, and Microsoft changed their implementation to be more strict about when it allows certain messages.

This isn't specific to mox. Maddy, another mail server written in Go is also seeing TLS interoperability issues with Microsoft/outlook.com. More details:

https://github.com/mjl-/mox/issues/237 https://github.com/foxcpp/maddy/issues/730

What did you expect to see?

The Go TLS session ticket may come too early for some other TLS clients. I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.

seankhliao commented 1 week ago

cc @golang/security

ianlancetaylor commented 1 week ago

Possibly a case of https://tldr.fail.

FiloSottile commented 1 week ago

At that stage in the handshake tldr.fail is unlikely. We did have an issue I can’t find at the moment about the size of our tickets with MS stacks, but it’s mostly client certificates that make those grow.

nabeken commented 6 days ago

@mjl- Thank you for your report. We were also about to report this at https://github.com/golang/go/issues/61721 but I found this issue. So I decided to redirect our report here.

We (@stupoid and I) are investigating TLS 1.3 handshake issue with connections from Outlook (Exchange Online). Let us share our findings.

In short: Microsoft's TLS 1.3 implementation seems to terminate a TLS connection during the handshake depending on timing when it receives NewSessionTicket message.

If you are experiencing similar issues with TLS 1.3 handshakes and find that setting SessionTicketsDisabled: true resolves the problem, you might be impacted by this issue.

Below is the details of our findings.

TLS 1.3 Handshake Interoperability Issue

Our server Setup

# go version
go version go1.22.7 linux/amd64

Problem

As @stupoid describes at https://github.com/golang/go/issues/61721#issuecomment-2451168692, we were encountering EOF errors during TLS 1.3 handshake from connections from outlook.com (Exchange Online). The problem was gone after disabling TLS 1.3 for them.

Observation

We compared how TLS 1.3 handshake went with Go 1.22.7, against an SMTP client and Exchange Online (outlook.com).

The following is a dump of a successful handshake with SMTP client (openssl s_client -tls1_3 -starttls smtp -connect x.x.x.x:25).

13  0.012573    148.109.19.178  10.0.102.69 TLSv1.3 350 Client Hello
14  0.019012    10.0.102.69 148.109.19.178  TLSv1.3 1514    Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify, Finished
15  0.019017    10.0.102.69 148.109.19.178  TLSv1.3 207 New Session Ticket
16  0.022057    148.109.19.178  10.0.102.69 TCP 66  47586 → 25 [ACK] Seq=318 Ack=1557 Win=130176 Len=0 TSval=3073220876 TSecr=799896659
17  0.022058    148.109.19.178  10.0.102.69 TCP 66  47586 → 25 [ACK] Seq=318 Ack=1698 Win=130048 Len=0 TSval=3073220876 TSecr=799896659
18  0.023597    148.109.19.178  10.0.102.69 TLSv1.3 130 Change Cipher Spec, Finished
19  0.072053    10.0.102.69 148.109.19.178  TCP 66  25 → 47586 [ACK] Seq=1698 Ack=382 Win=62464 Len=0 TSval=799896713 TSecr=3073220877

The following is a dump of a failed handshake with Exchange Online.

10  0.043841    104.47.23.113   10.0.102.69 TLSv1.3 361 Client Hello
11  0.058433    10.0.102.69 104.47.23.113   TLSv1.3 1514    Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12  0.058439    10.0.102.69 104.47.23.113   TLSv1.3 248 Finished, New Session Ticket
13  0.068171    104.47.23.113   10.0.102.69 TCP 60  55106 → 25 [ACK] Seq=370 Ack=1792 Win=525568 Len=0
14  0.070915    104.47.23.113   10.0.102.69 TCP 60  55106 → 25 [FIN, ACK] Seq=370 Ack=1792 Win=525568 Len=0
15  0.071059    10.0.102.69 104.47.23.113   SMTP    101 S: 454 TLS not available due to temporary reason

The handshake didn't finish because the client sent FIN packet right after receiving the server's Finished and New Session Ticket. tls.Conn.Handshake returned a EOF at frame 15.

The key difference is the timing of New Session Ticket message where

On a side note, then we tested this with Postfix + OpenSSL (openssl-3.0.8-1.amzn2023.0.16.x86_64) and it seems to work fine but uses a different flow where OpenSSL (Postfix) sends New Session Ticket after receiving Finished message from the client (outlook.com)

To verify an assumption that New Session Ticket message might cause the problem in Microsoft's TLS implementation, we tried with SessionTicketsDisabled: true with Go and confirmed the handshake went well:

10  0.017334    104.47.23.169   10.0.102.69 TLSv1.3 361 Client Hello
11  0.031968    10.0.102.69 104.47.23.169   TLSv1.3 1514    Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data
12  0.031973    10.0.102.69 104.47.23.169   TLSv1.3 104 Application Data
13  0.035080    104.47.23.169   10.0.102.69 TCP 60  42783 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
14  0.037309    104.47.23.169   10.0.102.69 TLSv1.3 118 Change Cipher Spec, Application Data
15  0.079227    10.0.102.69 104.47.23.169   TCP 54  25 → 42783 [ACK] Seq=1648 Ack=434 Win=62592 Len=0
16  0.082362    104.47.23.169   10.0.102.69 TLSv1.3 128 Application Data
17  0.082402    10.0.102.69 104.47.23.169   TCP 54  25 → 42783 [ACK] Seq=1648 Ack=508 Win=62592 Len=0
18  0.082849    10.0.102.69 104.47.23.169   TLSv1.3 149 Application Data
19  0.095041    104.47.23.169   10.0.102.69 TLSv1.3 82  Application Data
20  0.095141    104.47.23.169   10.0.102.69 TCP 60  42783 → 25 [RST, ACK] Seq=536 Ack=1743 Win=0 Len=0
21  22.653077   40.93.73.24 10.0.102.69 TCP 66  60619 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1398 WS=256 SACK_PERM
22  22.653108   10.0.102.69 40.93.73.24 TCP 66  25 → 60619 [SYN, ACK] Seq=0 Ack=1 Win=62727 Len=0 MSS=8961 SACK_PERM WS=128
23  22.656931   40.93.73.24 10.0.102.69 TCP 60  60619 → 25 [ACK] Seq=1 Ack=1 Win=524288 Len=0
24  22.657119   10.0.102.69 40.93.73.24 SMTP    80  S: 220 mx.example.com ESMTP

Analysis

While I'm not an expert in TLS implementation, I reviewed the spec and found the following:

https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 says:

At any time after the server has received the client Finished message, it MAY send a NewSessionTicket message.

and

Note: Although the resumption master secret depends on the client's second flight, a server which does not request client authentication MAY compute the remainder of the transcript independently and then send a NewSessionTicket immediately upon sending its Finished rather than waiting for the client Finished.

I think Go's TLS stack follows the second case because the server doesn't request client authentication.

On the other hands, Microsoft's TLS stack might expect to receive the server's Finished first and receive NewSessionTicket message in another flight, especially because Go's TLS will flush the buffer along with Finished and NewSessionTicket, not flush Finished message first and send NewSessionTicket.

To verify this hypothesis, I made a small modification to the Go's handshake code to flush the buffer first before sending NewSessionTicket, and send it after the flush.

Here is the patch I tested with:

--- src/crypto/tls/handshake_server_tls13.go.orig   2024-11-07 04:28:50.967023405 +0000
+++ src/crypto/tls/handshake_server_tls13.go    2024-11-07 05:02:21.053073557 +0000
@@ -75,9 +75,17 @@
    if _, err := c.flush(); err != nil {
        return err
    }
+
    if err := hs.readClientCertificate(); err != nil {
        return err
    }
+
+   if !hs.requestClientCert() {
+       if err := hs.sendSessionTickets(); err != nil {
+           return err
+       }
+   }
+
    if err := hs.readClientFinished(); err != nil {
        return err
    }
@@ -777,11 +785,11 @@
    // If we did not request client certificates, at this point we can
    // precompute the client finished and roll the transcript forward to send
    // session tickets in our first flight.
-   if !hs.requestClientCert() {
-       if err := hs.sendSessionTickets(); err != nil {
-           return err
-       }
-   }
+   //if !hs.requestClientCert() {
+   //  if err := hs.sendSessionTickets(); err != nil {
+   //      return err
+   //  }
+   //}

    return nil
 }

It seemed to work.

10  0.044304    104.47.23.112   10.0.102.69 TLSv1.3 361 Client Hello
11  0.051822    10.0.102.69 104.47.23.112   TLSv1.3 1514    Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12  0.051827    10.0.102.69 104.47.23.112   TLSv1.3 104 Finished
13  0.051889    10.0.102.69 104.47.23.112   TLSv1.3 198 New Session Ticket
14  0.061689    104.47.23.112   10.0.102.69 TCP 60  55773 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
15  0.063710    104.47.23.112   10.0.102.69 TLSv1.3 118 Change Cipher Spec, Finished
16  0.107821    10.0.102.69 104.47.23.112   TCP 54  25 → 55773 [ACK] Seq=1792 Ack=434 Win=62592 Len=0
17  0.117695    104.47.23.112   10.0.102.69 SMTP    128 C: EHLO JPN01-OS0-obe.outbound.protection.outlook.com

Questions...

@FiloSottile, as the author of this code almost 6 years ago, what do you think about this issue? Given these findings, should Go adjust its handshake behavior, or should Microsoft update their TLS 1.3 implementation for better interoperability?

stupoid commented 6 days ago

I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.

@mjl-

Just to add to this for anyone looking to sidestep this issue.

We encountered really similar issues and also tried what you mentioned by changing tls.Config.ClientAuth to the following 2 modes to see if it would work. Both seems to work fine without issues.

Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequestClientCert

1   2.991447    40.93.130.3 10.0.15.74  TLSv1.3 361 Client Hello
2   2.992553    10.0.15.74  40.93.130.3 TLSv1.3 1527    Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3   3.002707    40.93.130.3 10.0.15.74  TCP 60  11164 → 25 [ACK] Seq=369 Ack=1610 Win=524288 Len=0
4   3.006255    40.93.130.3 10.0.15.74  TLSv1.3 4125    Change Cipher Spec, Certificate, Certificate Verify, Finished
5   3.006322    10.0.15.74  40.93.130.3 TCP 54  25 → 11164 [ACK] Seq=1610 Ack=4440 Win=58624 Len=0
6   3.006673    10.0.15.74  40.93.130.3 TCP 2850    25 → 11164 [PSH, ACK] Seq=1610 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 51]
7   3.006688    10.0.15.74  40.93.130.3 TLSv1.3 1137    New Session Ticket

Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequireAndVerifyClientCert

1   2.157838    40.93.130.1 10.0.15.74  TLSv1.3 361 Client Hello
2   2.159897    10.0.15.74  40.93.130.1 TLSv1.3 1526    Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3   2.170862    40.93.130.1 10.0.15.74  TCP 60  15428 → 25 [ACK] Seq=369 Ack=1609 Win=524288 Len=0
4   2.174135    40.93.130.1 10.0.15.74  TLSv1.3 4125    Change Cipher Spec, Certificate, Certificate Verify, Finished
5   2.174193    10.0.15.74  40.93.130.1 TCP 54  25 → 15428 [ACK] Seq=1609 Ack=4440 Win=58624 Len=0
6   2.174901    10.0.15.74  40.93.130.1 TCP 2850    25 → 15428 [PSH, ACK] Seq=1609 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
7   2.174923    10.0.15.74  40.93.130.1 TCP 2850    25 → 15428 [PSH, ACK] Seq=4405 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
8   2.175234    10.0.15.74  40.93.130.1 TLSv1.3 555 New Session Ticket