golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.29k stars 17.7k forks source link

crypto/tls: usage of QUICResumeSession in Go 1.23rc1 breaks backwards compatibility #68124

Closed marten-seemann closed 5 months ago

marten-seemann commented 5 months ago

Go version

go version go1.23rc1 (all platforms)

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/marten/Library/Caches/go-build'
GOENV='/Users/marten/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/marten/src/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/marten/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/marten/bin/go1.23ex'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/marten/bin/go1.23ex/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23rc1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/marten/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/marten/src/go/src/github.com/quic-go/quic-go/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/q0/b5ynf00142l7bl9sp8y098zr0000gn/T/go-build1369234025=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I ran quic-go's test suite using Go 1.23: https://github.com/quic-go/quic-go/pull/4571.

What did you see happen?

Multiple tests related to QUIC session resumption are failing.

What did you expect to see?

As the QUIC API in crypto/tls is covered by Go's backwards compatibility guarantee, I expected the tests to pass.


https://github.com/golang/go/issues/63691 introduced two new QUIC events: QUICResumeSession and QUICStoreSession. The usage of QUICStoreSession is gated by the QUICConfig.EnableStoreSessionEvent config option, but the usage of QUICResumeSession isn't.

This means that crypto/tls passes QUICResumeSession to quic-go, which doesn't know how to handle this event, and therefore aborts the handshake.

I believe this is an oversight, and that both events should have been gated by the config option. We might also want to rename the config flag to EnableSessionEvents to make it clear that it covers both QUICStoreSession and QUICResumeSession.

cc @neild @FiloSottile

gabyhelp commented 5 months ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mauri870 commented 5 months ago

cc @neild

neild commented 5 months ago

Oops. Thanks for the report. Will fix.

Renaming EnableStoreSessionEvent seems reasonable.

neild commented 5 months ago

Hm. So, perhaps this is something we should change, but the oversight is one in documenting the original QUICConn.NextEvent API.

My intent was that implementations ignore events they don't recognize, precisely to permit us to add more events in the future. This probably should have been greased with the implementation providing a random event.

The reason we only have EnableStoreSessionEvent is that this event requires a response from the QUIC layer--when the event is enabled, the caller needs to call QUICConn.StoreSession to store the session, possibly after modifying it. In contrast, QUICResumeSessionEvent doesn't require a response.

But the QUICConn documentation doesn't actually say that implementations need to ignore unknown events, so maybe we should change the control to apply to resumption events as well. I think quic-go should switch to ignoring unknown events, though.

gopherbot commented 5 months ago

Change https://go.dev/cl/594475 mentions this issue: crypto/tls: apply QUIC session event flag to QUICResumeSession events

marten-seemann commented 5 months ago

My intent was that implementations ignore events they don't recognize, precisely to permit us to add more events in the future. This probably should have been greased with the implementation providing a random event.

I tried to find any record of this, but I can't seem to find anything. As a QUIC implementation, it seems dangerous to ignore events coming from the TLS stack.

Of course, crypto/tls could guarantee that it's safe to do ignore a particular event, but I'm not sure if that's practical either: Some events definitely can't be ignored, and it's unclear how the combinatorial explosion could be dealt with.

FiloSottile commented 5 months ago

There are three options for APIs like this

  1. all events are critical, clients error out on unknown events, any new event must be gated by some form of opt-in
  2. some existing events are critical, but future ones are not and should be ignored if unsupported, unless the application opts-in to some new implicitly critical event
  3. events have a Critical field, unsupported critical events cause an error

Sounds like the API was meant to be (2) and was interpreted as (1).

(1) hamstrings crypto/tls and will cause a bunch of new Config options to show up.

I disagree that (2) is dangerous: it is the job of crypto/tls to make sure that new events are backwards compatible, which includes being safe to ignore. Having to ensure they are safe to ignore is still much better than not being able to add them at all!

I think (3) doesn't make much sense here, because adding a new critical event without opt-in would break backwards compatibility, so we can never do it, bringing us back to (2) effectively.

I think it's early enough that we should clarify the API as (2), and add grease. Restricting ourselves to (1) would be a pain in the long term.

marten-seemann commented 5 months ago

Thanks for weighing in here @FiloSottile! I'm happy to change quic-go to (2), and I can do that in the next release. Should we document though that all the events defined in Go 1.22 are critical and must not be ignored?

With https://go-review.googlesource.com/c/go/+/594475, Go 1.23 won't break backwards compatibility with existing quic-go version. I assume that once we decide to add more events, current quic-go versions will have been phased out to a degree such that they won't cause any major problems.

neild commented 5 months ago

I don't think there's anything dangerous about ignoring unknown events here. In general, the failure mode of misusing this API is the handshake failing to complete. If we add new events that can't be ignored (such as QUICStoreSession), then we need to gate them behind an opt-in to avoid breaking backwards compatibility, so an implementation can safely assume that anything it doesn't recognize is unimportant (to it).

I also don't think we need to document that the current events are critical; a QUIC implementation which ignores any just isn't going to function.

I think that we should:

  1. Apply https://go.dev/cl/594475 to avoid breaking existing quic-go releases with Go 1.23.
  2. Document that implementations must ignore unknown events.
  3. Maybe add some grease in Go 1.24 or Go 1.25, once current quic-go versions are phased out.

I don't have any anticipation that we'll be making any more changes to this API in the foreseeable future, but that should keep the door open for the unforeseen.