Open mjl- opened 2 weeks ago
cc @golang/security
Possibly a case of https://tldr.fail.
At that stage in the handshake tldr.fail is unlikely. We did have an issue I can’t find at the moment about the size of our tickets with MS stacks, but it’s mostly client certificates that make those grow.
@mjl- Thank you for your report. We were also about to report this at https://github.com/golang/go/issues/61721 but I found this issue. So I decided to redirect our report here.
We (@stupoid and I) are investigating TLS 1.3 handshake issue with connections from Outlook (Exchange Online). Let us share our findings.
In short: Microsoft's TLS 1.3 implementation seems to terminate a TLS connection during the handshake depending on timing when it receives NewSessionTicket
message.
If you are experiencing similar issues with TLS 1.3 handshakes and find that setting SessionTicketsDisabled: true
resolves the problem, you might be impacted by this issue.
Below is the details of our findings.
# go version
go version go1.22.7 linux/amd64
As @stupoid describes at https://github.com/golang/go/issues/61721#issuecomment-2451168692, we were encountering EOF errors during TLS 1.3 handshake from connections from outlook.com
(Exchange Online). The problem was gone after disabling TLS 1.3 for them.
We compared how TLS 1.3 handshake went with Go 1.22.7, against an SMTP client and Exchange Online (outlook.com).
The following is a dump of a successful handshake with SMTP client (openssl s_client -tls1_3 -starttls smtp -connect x.x.x.x:25
).
13 0.012573 148.109.19.178 10.0.102.69 TLSv1.3 350 Client Hello
14 0.019012 10.0.102.69 148.109.19.178 TLSv1.3 1514 Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify, Finished
15 0.019017 10.0.102.69 148.109.19.178 TLSv1.3 207 New Session Ticket
16 0.022057 148.109.19.178 10.0.102.69 TCP 66 47586 → 25 [ACK] Seq=318 Ack=1557 Win=130176 Len=0 TSval=3073220876 TSecr=799896659
17 0.022058 148.109.19.178 10.0.102.69 TCP 66 47586 → 25 [ACK] Seq=318 Ack=1698 Win=130048 Len=0 TSval=3073220876 TSecr=799896659
18 0.023597 148.109.19.178 10.0.102.69 TLSv1.3 130 Change Cipher Spec, Finished
19 0.072053 10.0.102.69 148.109.19.178 TCP 66 25 → 47586 [ACK] Seq=1698 Ack=382 Win=62464 Len=0 TSval=799896713 TSecr=3073220877
The following is a dump of a failed handshake with Exchange Online.
10 0.043841 104.47.23.113 10.0.102.69 TLSv1.3 361 Client Hello
11 0.058433 10.0.102.69 104.47.23.113 TLSv1.3 1514 Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12 0.058439 10.0.102.69 104.47.23.113 TLSv1.3 248 Finished, New Session Ticket
13 0.068171 104.47.23.113 10.0.102.69 TCP 60 55106 → 25 [ACK] Seq=370 Ack=1792 Win=525568 Len=0
14 0.070915 104.47.23.113 10.0.102.69 TCP 60 55106 → 25 [FIN, ACK] Seq=370 Ack=1792 Win=525568 Len=0
15 0.071059 10.0.102.69 104.47.23.113 SMTP 101 S: 454 TLS not available due to temporary reason
The handshake didn't finish because the client sent FIN
packet right after receiving the server's Finished
and New Session Ticket
.
tls.Conn.Handshake
returned a EOF
at frame 15.
The key difference is the timing of New Session Ticket
message where
New Session Ticket
is sent after flushing sending Finished
message from the serverNew Session Ticket
is sent along with the server's Finished
message and before receiving the client's Finished
messageOn a side note, then we tested this with Postfix + OpenSSL (openssl-3.0.8-1.amzn2023.0.16.x86_64
) and it seems to work fine but uses a different flow where OpenSSL (Postfix) sends New Session Ticket after receiving Finished message from the client (outlook.com)
To verify an assumption that New Session Ticket
message might cause the problem in Microsoft's TLS implementation, we tried with SessionTicketsDisabled: true
with Go and confirmed the handshake went well:
10 0.017334 104.47.23.169 10.0.102.69 TLSv1.3 361 Client Hello
11 0.031968 10.0.102.69 104.47.23.169 TLSv1.3 1514 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data
12 0.031973 10.0.102.69 104.47.23.169 TLSv1.3 104 Application Data
13 0.035080 104.47.23.169 10.0.102.69 TCP 60 42783 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
14 0.037309 104.47.23.169 10.0.102.69 TLSv1.3 118 Change Cipher Spec, Application Data
15 0.079227 10.0.102.69 104.47.23.169 TCP 54 25 → 42783 [ACK] Seq=1648 Ack=434 Win=62592 Len=0
16 0.082362 104.47.23.169 10.0.102.69 TLSv1.3 128 Application Data
17 0.082402 10.0.102.69 104.47.23.169 TCP 54 25 → 42783 [ACK] Seq=1648 Ack=508 Win=62592 Len=0
18 0.082849 10.0.102.69 104.47.23.169 TLSv1.3 149 Application Data
19 0.095041 104.47.23.169 10.0.102.69 TLSv1.3 82 Application Data
20 0.095141 104.47.23.169 10.0.102.69 TCP 60 42783 → 25 [RST, ACK] Seq=536 Ack=1743 Win=0 Len=0
21 22.653077 40.93.73.24 10.0.102.69 TCP 66 60619 → 25 [SYN] Seq=0 Win=64240 Len=0 MSS=1398 WS=256 SACK_PERM
22 22.653108 10.0.102.69 40.93.73.24 TCP 66 25 → 60619 [SYN, ACK] Seq=0 Ack=1 Win=62727 Len=0 MSS=8961 SACK_PERM WS=128
23 22.656931 40.93.73.24 10.0.102.69 TCP 60 60619 → 25 [ACK] Seq=1 Ack=1 Win=524288 Len=0
24 22.657119 10.0.102.69 40.93.73.24 SMTP 80 S: 220 mx.example.com ESMTP
While I'm not an expert in TLS implementation, I reviewed the spec and found the following:
https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 says:
At any time after the server has received the client Finished message, it MAY send a NewSessionTicket message.
and
Note: Although the resumption master secret depends on the client's second flight, a server which does not request client authentication MAY compute the remainder of the transcript independently and then send a NewSessionTicket immediately upon sending its Finished rather than waiting for the client Finished.
I think Go's TLS stack follows the second case because the server doesn't request client authentication.
On the other hands, Microsoft's TLS stack might expect to receive the server's Finished
first and receive NewSessionTicket
message in another flight, especially because Go's TLS will flush the buffer along with Finished
and NewSessionTicket
, not flush Finished
message first and send NewSessionTicket
.
To verify this hypothesis, I made a small modification to the Go's handshake code to flush the buffer first before sending NewSessionTicket
, and send it after the flush.
Here is the patch I tested with:
--- src/crypto/tls/handshake_server_tls13.go.orig 2024-11-07 04:28:50.967023405 +0000
+++ src/crypto/tls/handshake_server_tls13.go 2024-11-07 05:02:21.053073557 +0000
@@ -75,9 +75,17 @@
if _, err := c.flush(); err != nil {
return err
}
+
if err := hs.readClientCertificate(); err != nil {
return err
}
+
+ if !hs.requestClientCert() {
+ if err := hs.sendSessionTickets(); err != nil {
+ return err
+ }
+ }
+
if err := hs.readClientFinished(); err != nil {
return err
}
@@ -777,11 +785,11 @@
// If we did not request client certificates, at this point we can
// precompute the client finished and roll the transcript forward to send
// session tickets in our first flight.
- if !hs.requestClientCert() {
- if err := hs.sendSessionTickets(); err != nil {
- return err
- }
- }
+ //if !hs.requestClientCert() {
+ // if err := hs.sendSessionTickets(); err != nil {
+ // return err
+ // }
+ //}
return nil
}
It seemed to work.
10 0.044304 104.47.23.112 10.0.102.69 TLSv1.3 361 Client Hello
11 0.051822 10.0.102.69 104.47.23.112 TLSv1.3 1514 Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate, Certificate Verify
12 0.051827 10.0.102.69 104.47.23.112 TLSv1.3 104 Finished
13 0.051889 10.0.102.69 104.47.23.112 TLSv1.3 198 New Session Ticket
14 0.061689 104.47.23.112 10.0.102.69 TCP 60 55773 → 25 [ACK] Seq=370 Ack=1648 Win=525568 Len=0
15 0.063710 104.47.23.112 10.0.102.69 TLSv1.3 118 Change Cipher Spec, Finished
16 0.107821 10.0.102.69 104.47.23.112 TCP 54 25 → 55773 [ACK] Seq=1792 Ack=434 Win=62592 Len=0
17 0.117695 104.47.23.112 10.0.102.69 SMTP 128 C: EHLO JPN01-OS0-obe.outbound.protection.outlook.com
@FiloSottile, as the author of this code almost 6 years ago, what do you think about this issue? Given these findings, should Go adjust its handshake behavior, or should Microsoft update their TLS 1.3 implementation for better interoperability?
I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.
@mjl-
Just to add to this for anyone looking to sidestep this issue.
We encountered really similar issues and also tried what you mentioned by changing tls.Config.ClientAuth
to the following 2 modes to see if it would work.
Both seems to work fine without issues.
Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequestClientCert
1 2.991447 40.93.130.3 10.0.15.74 TLSv1.3 361 Client Hello
2 2.992553 10.0.15.74 40.93.130.3 TLSv1.3 1527 Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3 3.002707 40.93.130.3 10.0.15.74 TCP 60 11164 → 25 [ACK] Seq=369 Ack=1610 Win=524288 Len=0
4 3.006255 40.93.130.3 10.0.15.74 TLSv1.3 4125 Change Cipher Spec, Certificate, Certificate Verify, Finished
5 3.006322 10.0.15.74 40.93.130.3 TCP 54 25 → 11164 [ACK] Seq=1610 Ack=4440 Win=58624 Len=0
6 3.006673 10.0.15.74 40.93.130.3 TCP 2850 25 → 11164 [PSH, ACK] Seq=1610 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 51]
7 3.006688 10.0.15.74 40.93.130.3 TLSv1.3 1137 New Session Ticket
Dump of interaction with Exchange Online (outlook.com) with ClientAuth set to RequireAndVerifyClientCert
1 2.157838 40.93.130.1 10.0.15.74 TLSv1.3 361 Client Hello
2 2.159897 10.0.15.74 40.93.130.1 TLSv1.3 1526 Server Hello, Change Cipher Spec, Encrypted Extensions, Certificate Request, Certificate, Certificate Verify, Finished
3 2.170862 40.93.130.1 10.0.15.74 TCP 60 15428 → 25 [ACK] Seq=369 Ack=1609 Win=524288 Len=0
4 2.174135 40.93.130.1 10.0.15.74 TLSv1.3 4125 Change Cipher Spec, Certificate, Certificate Verify, Finished
5 2.174193 10.0.15.74 40.93.130.1 TCP 54 25 → 15428 [ACK] Seq=1609 Ack=4440 Win=58624 Len=0
6 2.174901 10.0.15.74 40.93.130.1 TCP 2850 25 → 15428 [PSH, ACK] Seq=1609 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
7 2.174923 10.0.15.74 40.93.130.1 TCP 2850 25 → 15428 [PSH, ACK] Seq=4405 Ack=4440 Win=58624 Len=2796 [TCP PDU reassembled in 53]
8 2.175234 10.0.15.74 40.93.130.1 TLSv1.3 555 New Session Ticket
Go version
go1.23.2 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Deploy mox, a mail server, and successfully get incoming email message deliveries from microsoft (outlook.com, both office365 and personal/free accounts) to mox over SMTP with STARTTLS (crypto/tls server).
What did you see happen?
On October 24 I started receiving "TLS reporting" errors with "validation failure" error in the "sts" (MTA-STS) section. Up to and including October 23 I received TLS reports with only successful delivery attempts. I investigated, but couldn't find anything wrong. Yesterday I learned message deliveries from microsoft (outlook.com servers) to mox were failing. The TLS reporting error message wasn't precise/clear, but there's a good chance it was about these failing deliveries attempts.
The symptoms: I would see an incoming smtp connection, the "starttls" command, and an abrupt close of the connection by remote. Debugging revealed the connection was closed by remote after reading the server-side response the the TLS client hello message, without the remote writing anything in response (EOF while trying to read the first bytes looking for the "client finished" message). During more debugging, I noticed the Go TLS server code sends a session ticket message as part of its response to the client hello message. Setting
tls.Config.SessionTicketsDisabled = true
prevents the new session ticket from being sent, and makes the Microsoft SMTP STARTTLS command, and delivery of messages, succeed.At https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1 I noticed:
One theory: The Go TLS server is sending the NewSessionTicket message too soon, and Microsoft changed their implementation to be more strict about when it allows certain messages.
This isn't specific to mox. Maddy, another mail server written in Go is also seeing TLS interoperability issues with Microsoft/outlook.com. More details:
https://github.com/mjl-/mox/issues/237 https://github.com/foxcpp/maddy/issues/730
What did you expect to see?
The Go TLS session ticket may come too early for some other TLS clients. I did not try changing the crypto/tls code to only send a new session ticket message after having read the client finished message. May be worth trying, to see if that will result in a successful TLS session or sees the same abrupt connection close.