golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.86k stars 17.65k forks source link

crypto/tls: support QUIC as a transport #44886

Closed neild closed 4 months ago

neild commented 3 years ago

QUIC is the transport protocol underlying HTTP/3, as detailed in https://tools.ietf.org/html/draft-ietf-quic-tls-34. A QUIC implementation requires an unusually tight coupling with TLS; quoting the draft:

Rather than a strict layering, these two protocols cooperate: QUIC uses the TLS handshake; TLS uses the reliability, ordered delivery, and record layer provided by QUIC.

The crypto/tls package does not currently provide an API suitable for use by a QUIC implementation. I propose that we add one.

Related background

BoringSSL provides a set of functions specifically for use by QUIC implementations: https://commondatastorage.googleapis.com/chromium-boringssl-docs/ssl.h.html#QUIC-integration

The quic-go QUIC implementation uses a fork of crypto/tls. The quic-go fork adds a RecordLayer interface, which is quite similar to the BoringSSL API.

A QUIC implementation needs to:

BoringSSL and the quic-go fork of crypto/tls both provide these capabilities.

The quic-go fork provide additional extensions to crypto/tls around the handling of early data and session tickets. Those changes are out of scope for this proposal, which addresses only the QUIC-specific need to replace the record layer.

Proposed API changes

package tls

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
  EncryptionLevelInitial = iota
  EncryptionLevelEarlyData
  EncryptionLevelHandshake
  EncryptionLevelApplication
)

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
//
// It is an error to call Read, Write, or CloseWrite on a connection with
// a non-nil QUICTransport.
type QUICTransport struct {
  // SetReadSecret configures the read secret and cipher suite for the given
  // encryption level. It will be called at most once per encryption level.
  //
  // QUIC ACKs packets at the same level they were received at, except that
  // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
  // keys will always be installed with SetWriteSecret before the
  // packet-reading keys with SetReadSecret, ensuring that QUIC can always
  // ACK any packet that it decrypts.
  SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // SetWriteSecret configures the write secret and cipher suite for the
  // given encryption level. It will be called at most once per encryption
  // level.
  //
  // See SetReadSecret for additional invariants between packets and their
  // ACKs.
  SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // WriteHandshakeData adds handshake data to the current flight at the
  // given encryption level.
  //
  // A single handshake flight may include data from multiple encryption
  // levels. QUIC implementations should defer writing data to the network
  // until FlushHandshakeData to better pack QUIC packets into transport
  // datagrams.
  WriteHandshakeData func(level EncryptionLevel, data []byte) error

  // FlushHandshakeData is called when the current flight is complete and
  // should be written to the transport. Note that a flight may contain
  // data at several encryption levels.
  FlushHandshakeData func() error

  // ReadHandshakeData is called to request handshake data. It follows the
  // same contract as io.Reader's Read method, but returns the encryption
  // level of the data as well as the number of bytes read and error.
  //
  // ReadHandshakeData must not combine data from multiple encryption levels.
  //
  // ReadHandshakeData must block until at least one byte of data is
  // available, and must return as soon as least one byte of data is
  // available.
  ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

  // Alert sends a fatal alert at the specified encryption level.
  //
  // If the level is not EncryptionLevelInitial, this function will not
  // be called before the write secret for the level is initialized.
  Alert func(EncryptionLevel, uint8)

  // SetTransportParameters provides the extension_data field of the
  // quic_transport_parameters extension sent by the peer. It will
  // always be called before the successful completion of a handshake.
  SetTransportParameters func([]byte)

  // GetTransportParameters returns the extension_data field of the
  // quic_transport_parameters extension to send to the peer.
  GetTransportParameters func() []byte
}

type Config struct {
  // If QUICTransport is non-nil, it replaces the TLS transport layer.
  // In this case, MinVersion and MaxVersion must be VersionTLS13.
  QUICTransport *QUICTransport
}

// ProcessQUICPostHandshake processes data that has become available
// after the handshake has completed. It must not be called until
// Handshake has returned successfully. It causes a call to the
// QUICTransport ReadHandshakeData function requesting the new data.
func (c *Conn) ProcessQUICPostHandshake() error { … }
neild commented 3 years ago

/cc @FiloSottile @lucas-clemente @marten-seemann

marten-seemann commented 3 years ago

Thank you @neild! I really like the API you're proposing.

A few thoughts:

  1. ALPN: QUIC mandates the use of ALPN (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1). In particular, it requires that the handhake fails if client and server can't agree on an application protocol. crypto/tls currently doesn't enforce this. We would need to enforce this check if Config.QUICTransport is set (no additional API required).
  2. Transport Parameters: We should guarantee that GetTransportParameters is called before SetTransportParameters on the server side (this is trivially possible, as the client's transport parameters are sent in the ClientHello. We just need to document it). Transport parameters are used to negotiate QUIC extensions, so a server might modify its transport parameters based on what it received from the client.
  3. WriteHandshakeData: not sure if we need the encryption level here. TLS writes messages in ascending order, i.e. it first writes all data at the Initial level, then at Handshake level, then at Application level. Once it's done with one level, it never goes back. In my crypto/tls fork the (admittedly insufficiently documented) contract is that after SetWriteSecret was called for encryption level X, all data written belongs to that encryption level, until SetWriteSecret is called for encryption level X+1. I really like the idea of explicitly setting the boundaries between flights with FlushHandshakeData, that would simplify packet generation quite a bit.
  4. You defined a EncryptionLevelEarlyData, but I'm not sure if QUIC 0-RTT is in scope here (note that crypto/tls doesn't currently support 0-RTT for TLS1.3/TCP). In order to support 0-RTT, the QUIC server needs to be able to check if it wants to allow 0-RTT handshake (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-4.6.2 for details). That means that a QUIC implementation needs to be able to:
    1. server: save the transport parameters used on the old connection in the session ticket
    2. server: restore it from the session ticket when the client tries to resume a connection
    3. server: decide if to accept / reject 0-RTT
    4. client: learn if 0-RTT was accepted or rejected.
neild commented 3 years ago
  1. Is it necessary for crypto/tls to enforce application protocol negotiation? The QUIC layer could check the ConnectionState after the handshake completes to verify a protocol was negotiated.
  2. Guaranteeing that the client quic_transport_parameters extension is provided to the server before requesting transport parameters to send to the client sounds reasonable. (You said GetTransportParameters is called before SetTransportParameters, but I think you mean the other way around.)
  3. You're right that we technically don't need to provide the encryption level in WriteHandshakeData, but I think doing so makes it less likely the QUIC implementation passes data at the wrong encryption level. I'll defer to @FiloSottile's expertise here.
  4. I defined EncryptionLevelEarlyData, because that enum entry is the only component of the transport-layer API required to support early data. But I think you're right that there are other crypto/tls changes required to support 0-RTT. I think those changes are less QUIC-specific, and should probably be a separate proposal. I don't have a strong opinion on whether it's worth including the EncryptionLevelEarlyData enum entry before that proposal or not.
marten-seemann commented 3 years ago
  1. The QUIC spec says that you "MUST immediately" close the connection if no application protocol is negotiated, see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1. Not sure if right after handshake completion still qualifies as "immediately". I decided to throw the error as early as possible.
  2. Yes, that's what I meant.
OneOfOne commented 3 years ago

Any progress?

neild commented 1 year ago

Going to try to move this along. Updated proposal after some discussions with @rolandshoemaker and @FiloSottile:

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
        EncryptionLevelInitial = iota
        EncryptionLevelHandshake
        EncryptionLevelApplication
)

// An AlertError is a TLS alert.
type AlertError uint8

// A QUICConn represents a connection which uses a QUIC implementation as the underlying
// transport as described in RFC 9001.
type QUICConn

// QUICClient returns a new TLS client side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICClient(t *QUICTransport, config *Config) *QUICConn

// QUICClient returns a new TLS server side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICServer(t *QUICTransport, config *Config) *QUICConn

// Handshake runs the client or server handshake protocol.
//
// When the handshake ends with a TLS alert, Handshake returns an error wrapping AlertError.
func (q *QUICConn) Handshake() error

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
type QUICTransport struct {
        // SetReadSecret configures the read secret and cipher suite for the given
        // encryption level. It will be called at most once per encryption level.
        //
        // QUIC ACKs packets at the same level they were received at, except that
        // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
        // keys will always be installed with SetWriteSecret before the
        // packet-reading keys with SetReadSecret, ensuring that QUIC can always
        // ACK any packet that it decrypts.
        //
        // After SetReadSecret is called for an encryption level, no more data is
        // expected at previous levels.
        SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // SetWriteSecret configures the write secret and cipher suite for the
        // given encryption level. It will be called at most once per encryption
        // level.
        //
        // See SetReadSecret for additional invariants between packets and their
        // ACKs.
        //
        // After SetWriteSecret is called for an encryption level, all future writes
        // will be at that level until the next call to SetWriteSecret.
        SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // WriteHandshakeData adds handshake data to the current flight at the
        // given encryption level.
        //
        // A single handshake flight may include data from multiple encryption
        // levels. QUIC implementations should defer writing data to the network
        // until FlushHandshakeData to better pack QUIC packets into transport
        // datagrams.
        WriteHandshakeData func(level EncryptionLevel, data []byte) error

        // FlushHandshakeData is called when the current flight is complete and
        // should be written to the transport. Note that a flight may contain
        // data at several encryption levels.
        FlushHandshakeData func() error

        // ReadHandshakeData is called to request handshake data. It follows the
        // same contract as io.Reader's Read method, but returns the encryption
        // level of the data as well as the number of bytes read and error.
        //
        // ReadHandshakeData must not combine data from multiple encryption levels.
        //
        // ReadHandshakeData must block until at least one byte of data is
        // available, and must return as soon as least one byte of data is
        // available.
        ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

        // GetTransportParameters returns the extension_data field of the
        // quic_transport_parameters extension to send to the peer.
        //
        // Server connections always call SetTransportParameters before
        // GetTransportParameters.
        GetTransportParameters func() ([]byte, error)
}

This moves the QUIC support into a new QUICConn type. (Internally, QUICConn probably just wraps a tls.Conn, but this keeps the API surface clearer.)

In QUIC, TLS alerts are always fatal and terminate the connection. The Alert callback is dropped in favor of just returning a distinguishable error from Handshake. We'd probably do this for all handshakes, not just QUIC ones, which would satisfy #35234.

The progression of encryption levels is more explicitly spelled out: Once a read or write secret is provided for a level, no more data is read/written at previous levels. This means the WriteHandshakeData and ReadHandshakeData functions don't really need to pass around the encryption level, but doing so makes it harder to accidentally do the wrong thing.

I've added the guarantee that servers call SetTransportParameters before GetTransportParameters as suggested by @marten-seemann (thanks!). (Perhaps these functions would be clearer as SendTransportParameters and ReceiveTransportParameters?)

@marten-seemann points out (https://github.com/golang/go/issues/44886#issuecomment-794825341) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

An open question is how to extend this to support early data / 0-RTT. Since crypto/tls has no early data support at the moment, it's difficult to say exactly what the QUIC-specific interface to this should be. In general terms, we would need:

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support. The only QUIC-specific portions of the API are, I think, a mechanism to receive and send NewSessionTicket messages. (Perhaps we call QUICTransport.WriteHandshakeData when prompted to send a ticket, and provide a QUICConn.ProcessPostHandshakeData function to handle post-handshake CRYPTO frames.)

Edit: Added an error return to SetTransportParameters and GetTransportParameters. SetTransportParameters can fail if the parameters cannot be parsed or are invalid. I don't see how GetTransportParameters can fail in any reasonable scenario, but may as well let it return an error for consistency.

awnumar commented 1 year ago

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support.

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

DeedleFake commented 1 year ago

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

I'm not sure that it's so clear. I found https://github.com/golang/go/issues/9671#issuecomment-519209728 and

https://github.com/golang/go/blob/80c5bbc627a37929ea571e99f0f15cb059fdaf70/src/crypto/tls/handshake_server_tls13.go#L278-L280

It looks to me like support just hasn't been implemented yet, not that there's any particular opposition to it.

neild commented 1 year ago

Perhaps I should say if crypto/tls gains 0-RTT support.

My goal is to ensure we don't paint ourselves into a corner with the QUIC APIs such that it would be difficult to support 0-RTT with them in the future, in the event crypto/tls does add that feature.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

tmthrgd commented 1 year ago

@neild Two questions about the proposed API, why is QUICTransport a struct containing functions rather than an interface? I can't think of anything similar in the standard library.

And secondly who provides the QUICTransport? How do you go from 'I want to connect to google.com:443 over QUIC' or 'I want to host a server on port 443 using QUIC' to a QUICConn?

DeedleFake commented 1 year ago

I think it's a struct because it's a field in tls.Config, which also has a number of fields that are funcs for similar purposes. The idea here is to modify tls.Config to add support for the necessary pieces for QUIC.

Note that this proposal is specifically for crypto/tls, not for net. It doesn't cover things like QUICConn. It's just for adding the pieces of TLS-related functionality necessary to make something like QUICConn possible, and then that can hopefully be added later.

FiloSottile commented 1 year ago

Indeed, this is about the crypto/tls API needed by HTTP/3 implementations in other packages.

It's a struct of callbacks because that can be extended in the future with other optional callbacks, while interfaces can't ever add method backwards-compatibly. I agree it's a little weird, but I don't have a better suggestion.

neild commented 1 year ago

Yes, QUICTransport is a struct so we can add additional functions in the future.

The QUICTransport is provided by a QUIC implementation. This proposal is about making it possible to write a QUIC implementation that uses crypto/tls, not about providing a QUIC implementation. For example, github.com/lucas-clemente/quic-go needs to use a forked version of crypto/tls, since the crypto/tls API doesn't currently allow for building a QUIC implementation atop it.

QUICConn in this proposal is a QUIC-specific tls.Conn, not a QUIC connection. We'll want to clearly explain that in the documentation.

marten-seemann commented 1 year ago

A few thoughts about the API suggested in https://github.com/golang/go/issues/44886#issuecomment-1385846207:

  1. Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets. It would be sad if crypto/tls was spawning a Go routine that continuously blocks on ReadHandshakeData. Instead, I'd prefer to re-add the ProcessQUICPostHandshake suggested in https://github.com/golang/go/issues/44886#issue-826317860 to the QUICConn.
  2. Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.
  3. Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?
  4. Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite. tls.Config.CipherSuites doesn't have any effect in TLS 1.3, and I'm not suggesting to change that, the removal of that config option makes sense. However, a QUIC implementation will need to implement different code paths (e.g. for header encryption using AES and using ChaCha), and there needs to be some way to test those.
  5.     // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.". I imagine this is how it would be implemented anyway. This would be a requirement if one wanted to implemented the QUIC Version Negotiation extension. This would also be a requirement to send missing_extension alert if the quic_transport_parameter extension is missing (see Section 8.2. On the server side, it would also be nice if SetTransportParameters was called before GetTransportParameters, as the server would then be able to generate transport parameters (incl. extensions) in response to the values that the client sent.

@marten-seemann points out (https://github.com/golang/go/issues/44886#issuecomment-794825341) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately". That means that using this API, it won't be possible to correctly implement the RFC. What about adding a new field to the tls.Config.EnforceALPN bool (or alternatively a HandleALPN([]string) (string, error)). This would certainly be useful for TLS / TCP as well, and make it easier to prevent protocol confusion attacks / bugs.


@neild Please let me know once you have a branch to play around with. I'd be more than happy to integrate it into a quic-go branch (even at an early stage), so we can how the new API works in practice.

james-lawrence commented 1 year ago

@neild think functional constructors will work better than the struct based approach here. will let us hide the internals of the structure layout.

gopherbot commented 1 year ago

Change https://go.dev/cl/461608 mentions this issue: crypto/tls: support QUIC as a transport

neild commented 1 year ago

@marten-seemann Thanks for the comments!

Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets.

I was thinking that there isn't any use for post-handshake TLS messages until crypto/tls supports early data, but I'd forgotten about session tickets. I agree, we need a way to handle post-handshake data.

// HandlePostHandshakeData processes data that has become available
// after the handshake has completed.
// It must not be called before Handshake has returned successfully.
// It returns the number of bytes of data successfully processed,
// which may be zero if data does not contain a complete message.
// When n <= len(data), the caller should retain the unprocessed bytes
// and provide them with additional data when available.
func (q *QUICConn) HandlePostHandshakeData(data []byte) (n int, err error) {}

Alternatively, HandlePostHandshakeData could buffer partial messages. This would be simpler for the caller, but given that a QUIC implementation needs to have some form of input buffering to allow for packet reordering, perhaps it's best not to introduce an additional buffering layer here. What do you think?

Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.

My view is that the QUIC implementation should detect CRYPTO data sent at a previous encryption level, but crypto/tls will verify that the QUIC implementation has done so correctly.

Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?

Isn't this aes.NewCipher and chacha20poly1305.New? Or is there something else I'm not thinking of?

Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite.

This seems like it'd be useful to have, but I think it's out of scope for this proposal.

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.".

Seems reasonable. How about:

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer.
        //
        // For client connections, SetTransportParameters will be called before
        // EncryptionLevelApplication keys are installed with SetWriteSecret.
        // For server connections, SetTransportParameters will be called before
        // EncryptionLevelHandshake keys are installed with SetWriteSecret.
        SetTransportParameters func([]byte) error

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately".

I'm convinced.

How about: If tls.Config.NextProtos is not set, then the handshake will complete successfully even if the peer doesn't provide an ALPN protocol. However, if tls.Config.NextProtos is set, then we generate a no_application_protocol alert if the peer doesn't support ALPN. This is in contrast to the non-QUIC behavior of succeeding with no negotiated protocol when the peer doesn't support ALPN.

This requires no API changes, and the only behavioral change is to return an error when QUIC is enabled, NextProtos is set, and the peer doesn't support ALPN.

Please let me know once you have a branch to play around with.

https://go.dev/cl/461608 is a very-draft implementation of the proposal. (No tests, doesn't enforce all the constraints defined in RFC 9001.)

neild commented 1 year ago

@neild think functional constructors will work better than the struct based approach here. will let use hide the internals of the structure layout.

What's the motivation for hiding the internals of QUICTransport? It's a configuration struct.

james-lawrence commented 1 year ago

@neild because it'll lock in the API for all time. functional constructors wont.

neild commented 1 year ago

Implementing HandlePostHandshakeData got me thinking a bit more about the API for providing handshake data to the QUICConn.

The design of crypto/tls is such that we pretty much have to have a goroutine running the handshake, since the handshake is structured as blocking functions which wait for additional data at various points. It would be possible to restructure the handshake to be non-blocking, of course, but not trivial.

The QUICConn design above places the responsibility for running the handshake goroutine on the QUIC implementation.

An alternative would be to have crypto/tls run the handshake goroutine, replacing QUICTransport.ReadHandshakeData with a synchronous call on the QUICConn:

// HandleData processes handshake data received in CRYPTO frames.
//
// HandleData may call hooks on the QUICTransport.
func (q *QUICConn) HandleData(level EncryptionLevel, data []byte) error {
}

// QUICTransport hooks are only called from QUICClient, QUICServer, or QUICConn.HandleData.
// Hooks are never called asynchronously.
type QUICTransport struct {
  // HandshakeComplete is called when the TLS handshake completes successfully.
  HandshakeComplete func()

  // ...
}

(HandleData would also supersede HandlePostHandshakeData, of course.)

Internally, crypto/tls would start a goroutine to run the handshake, but the QUIC implementation could engage with it using synchronous function calls.

This has the advantage of being (I think) a simpler interface for a QUIC implementation to use, and leaving the door open for us adding a non-blocking handshake implementation some day. A disadvantage is that HandleData will block until the provided data has been processed; it would be unfortunate if QUIC implementations felt the need to run HandleData in a background goroutine to avoid blocking packet processing.

gopherbot commented 1 year ago

Change https://go.dev/cl/472621 mentions this issue: crypto/tls: support QUIC as a transport, synchronous form

neild commented 1 year ago

I've got a slightly different approach in https://go.dev/cl/472621. @rolandshoemaker, @FiloSottile, @marten-seemann, I'd appreciate your thoughts on whether this seems like an improvement.

Unlike the original proposal, this API is fully synchronous. The QUICConn still interacts with the transport via QUICTransport callbacks, but callbacks are only invoked synchronously from QUICConn methods. The handshake still runs in a goroutine, but crypto/tls manages interactions with it.

An advantage of this approach is that the QUIC implementation has an easier time determining when the TLS layer is done processing a chunk of handshake bytes--the QUIC side calls QUICConn.HandleHandshakeData with incoming data from CRYPTO frames, and is assured that the data is processed when that function returns.

The full API:

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
    EncryptionLevelInitial = EncryptionLevel(iota)
    EncryptionLevelHandshake
    EncryptionLevelApplication
)

// An AlertError is a TLS alert.
type AlertError uint8

// A QUICConn represents a connection which uses a QUIC implementation as the underlying
// transport as described in RFC 9001.
//
// Methods of QUICConn are not safe for concurrent use.
//
// A QUICConn is created with a QUICTransport containing a set of functions used to
// inform the underlying transport of key changes, new data to send in CRYPTO frames,
// and so forth. These functions are called synchronously by Start and HandleHandshakeData,
// never asynchronously in the background.
//
// QUICConn errors wrap Alert.
type QUICConn struct {}

// QUICClient returns a new TLS client side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICClient(t *QUICTransport, config *Config) *QUICConn

// QUICServer returns a new TLS server side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICServer(t *QUICTransport, config *Config) *QUICConn

// Start starts the client or server handshake protocol.
// It must be called at most once.
func (q *QUICConn) Start() error

// Close closes the connection and stops any in-progress handshake.
func (q *QUICConn) Close() error

// HandleHandshakeData handles handshake bytes received from the peer.
func (q *QUICConn) HandleHandshakeData(level EncryptionLevel, data []byte) error

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
type QUICTransport struct {
    // SetReadSecret configures the read secret and cipher suite for the given
    // encryption level. It will be called at most once per encryption level.
    //
    // QUIC ACKs packets at the same level they were received at, except that
    // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
    // keys will always be installed with SetWriteSecret before the
    // packet-reading keys with SetReadSecret, ensuring that QUIC can always
    // ACK any packet that it decrypts.
    //
    // Keys are not provided for the Initial encryption level.
    SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

    // SetWriteSecret configures the write secret and cipher suite for the
    // given encryption level. It will be called at most once per encryption
    // level.
    //
    // See SetReadSecret for additional invariants between packets and their
    // ACKs.
    SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

    // WriteHandshakeData adds handshake data to the current flight at the
    // given encryption level.
    //
    // A single handshake flight may include data from multiple encryption
    // levels. QUIC implementations should defer writing data to the network
    // until FlushHandshakeData to better pack QUIC packets into transport
    // datagrams.
    WriteHandshakeData func(level EncryptionLevel, data []byte) error

    // FlushHandshakeData is called when the current flight is complete and
    // should be written to the transport. Note that a flight may contain
    // data at several encryption levels.
    FlushHandshakeData func() error

    // SetTransportParameters provides the extension_data field of the
    // quic_transport_parameters extension sent by the peer.
    //
    // For client connections, SetTransportParameters will be called before
    // EncryptionLevelApplication keys are installed with SetWriteSecret.
    // For server connections, SetTransportParameters will be called before
    // EncryptionLevelHandshake keys are installed with SetWriteSecret.
    SetTransportParameters func([]byte) error

    // GetTransportParameters returns the extension_data field of the
    // quic_transport_parameters extension to send to the peer.
    GetTransportParameters func() ([]byte, error)

    // HandshakeDone is called when the handshake completes successfully.
    //
    // Unsuccessful completion is indicated by an error returned by
    // Start or HandleHandshakeData.
    HandshakeDone func()
}

To summarize changes from the previous iteration:

I've left the FlushHandshakeData callback as-is for now, but it may no longer be of use since each flight of data is bounded by a call to Start or ReadHandshakeData.

marten-seemann commented 1 year ago

I haven't looked at the code in detail yet. Here are my initial thoughts about this API.

  • Handshake data received in CRYPTO frames is provided to QUICConn.HandleHandshakeData, rather than requested asynchronously through a QUICTransport callback.

Am I right to assume that all calls to WriteHandshakeData and FlushHandshakeData would be triggered before HandleHandshakeData returns? This would be crucial to efficiently pack packets: The QUIC stack doesn't know whether the data it just passed to HandleHandshakeData contained a complete TLS message. If it wasn't, it still needs to send an ACK for the packet that carried the CRYPTO frame. If it was, this ACK needs to be bundled with the next flight of TLS messages.

Would this function also be used for post-handshake data (i.e. session tickets)? If so, the name is probably not ideal. Maybe HandleCryptoData would be more clear, as the function handles all data that's transmitted in QUIC CRYPTO frames.

  • QUICConn.Handshake has been replaced by QUICConn.Start. Start writes initial handshake data and returns.

I'm not sure I understand the motivation behind this change. In general, I prefer APIs where the caller starts the Go routine, as it allows for more fine-grained control of the Go routine lifecycle. Having a Handshake(context.Context) error function makes error handling very straightforward (and might allow us to remove the error return value from HandleHandshakeData). It would also allow removing of the Close method and the HandshakeDone callback.

neild commented 1 year ago

Am I right to assume that all calls to WriteHandshakeData and FlushHandshakeData would be triggered before HandleHandshakeData returns?

Yes, exactly. Call QUICConn.HandleHandshakeData with received CRYPTO frame data -> QUICConn calls back to various QUICTransport funcs -> HandleHandshakeData returns. Once HandleHandshakeData returns, the TLS stack is idle and will not produce any events until provided with more data.

As you say, this means the QUIC layer can delay sending an ACK until after HandleHandshakeData returns, at which point it knows that it has any CRYPTO data that needs to be sent in response to the processed frame.

Would this function also be used for post-handshake data (i.e. session tickets)?

Yes.

Somewhat confusingly, session tickets are part of the TLS handshake protocol (RFC 8446, section 4) even though they are sent after the handshake completes. Perhaps it's clearer to call these functions HandleCryptoData and WriteCryptoData, naming them after the QUIC CRYPTO frames that carry the data rather than the TLS handshake protocol.

I'm not sure I understand the motivation behind this change.

As with HandleHandshakeData, Start will call WriteHandshakeData to write the ClientHello message (clients only, obviously) and then return. The QUIC layer is assured that the TLS layer is idle and waiting for input once Start returns.

Every QUICTransport callback is invoked synchronously from a QUICConn method with a bounded lifetime. Internally to crypto/tls, there's a goroutine running the handshake, but that's an implementation detail.

neild commented 1 year ago

Updated https://go.dev/cl/472621 with a few changes:

neild commented 1 year ago

One other change I forgot to document:

marten-seemann commented 1 year ago

This is exciting! I just made the corresponding changes to quic-go (it's quite a big change!) and it simplified the code considerably. Here's the branch: https://github.com/quic-go/quic-go/compare/cryptotls It's not an entirely fair comparison at this stage, since my existing crypto/tls fork (qtls) supports 0-RTT. Those code paths are currently disabled. The normal 1 RTT handshakes work, as well as TLS session resumption. quic-go has a pretty comprehensive test suite, which is a great way to make sure everything works as expected.

Here's a few things I noticed:

  1. RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram). The current code exports the 1-RTT read and write keys at the same time (right after receiving the ClientHello), making it easy to accidentally use that key before the spec allows. Of course, it's possible to hold back the key at the QUIC layer, but I'd argue that given the language in the RFC, a safer API would not hand the key to the QUIC layer prematurely.
  2. The error returned by WriteCryptoData needs to convey the TLS alert code. This is needed for the QUIC stack to send the correct error code in the CONNECTION_CLOSE frame. Ideally, the return value is not just a tls.alert (which is a uint8), but would also contain the error string, so one can log the error. Otherwise debugging handshake failures will be really hard. I commented on the CL.
  3. The ALPN might depend on the QUIC version in use. quic-go currently support draft-29, QUIC v1 (RFC 9000) and QUIC v2. For draft-29, HTTP/3 uses a different ALPN value than for v1. quic-go currently injects a net.Conn into qtls with an additional method, that can then be interface-asserted from tls.ClientHelloInfo.Conn. This API was implemented before the ClientHelloInfo gained a Context method and I never bother to change it. Putting this on the context would probably be the more idiomatic API. I suggest we change the API to make it possible to set the context on the tls.ClientHelloInfo, e.g. by changing the signature of the Start method: func (q *QUICConn) Start(context.Context) error.
  4. In TLS 1.3, servers can send so-called 0.5-RTT data. This is application data sent encrypted with application level keys right after receiving the ClientHello. This is possible because the server derives keys for all encryption levels right afer processing the ClientHello. While this already works with the current version of the API, it's only really useful if the server can check which ALPN was negotiated on the connection, i.e. by calling ConnectionState. Unfortunately, this function blocks until the completion of the handshake. I've commented on the CL with a suggestion how to fix this.
  5. I had to copy a fair amount of code for the construction of the AEAD from crypto/tls (see cipher_suite.go in my branch, as the API only gives me a uint16. Maybe that's not too bad, it's just 100 LOC. If we wanted to change this, we could either expose a constructor for the cipher suite, or pass the cipherSuiteTLS13 (as an exported type) to SetReadSecret and SetWriteSecret.
  6. I have a few tests that test the different cipher suites. These are useful since QUIC header protection depends on the cipher suite in use (code). There's currently no way to test this, since there's no way to force a specific cipher suite.
neild commented 1 year ago

Thanks for the detailed comments!

RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram).

Good point. I've changed the handshake to hold off on providing the 1-RTT write secret until after the handshake is completed.

A related minor change: I renamed the QUICTransport.HandshakeDone hook to QUICTransport.HandshakeComplete, which is more consistent with ConnectionState.HandshakeComplete.

The error returned by WriteCryptoData needs to convey the TLS alert code.

It does so; when the error returned is an alert condition, WriteCryptoData returns an error wrapping tls.AlertError, which contains the code. The AlertError.Error contains a textual description of the error.

I've also updated WriteCryptoData to always return an AlertError (internal_error in the case of an uncategorized failure, usually resulting from a configuration error).

The ALPN might depend on the QUIC version in use.

The QUIC version should be known when QUICClient or QUICServer is called, no? It's set in the first packet of the connection, and the caller can set Config.NextProtos as appropriate at that time.

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

I had to copy a fair amount of code for the construction of the AEAD from crypto/tls

As you say, it's not a lot of copying. I'd rather keep the QUIC-related crypto/tls changes limited to things that can't be implemented in terms of the existing APIs.

I have a few tests that test the different cipher suites. These are useful since QUIC header protection depends on the cipher suite in use (code). There's currently no way to test this, since there's no way to force a specific cipher suite.

Forcing a specific cipher suite for TLS 1.3 does seem like it would be useful in testing, but I think it's out of scope for this proposal. @FiloSottile or @rolandshoemaker might have thoughts here; I know the current non-configurability of TLS 1.3 cipher suites is a deliberate choice.

marten-seemann commented 1 year ago

Thank you for updating the PR @neild! I made the corresponding changes to my quic-go branch.

RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram).

Good point. I've changed the handshake to hold off on providing the 1-RTT write secret until after the handshake is completed.

Great! Confirmed that it works now.

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

It works!

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

It works! My previously failing test case now passes.

The ALPN might depend on the QUIC version in use.

The QUIC version should be known when QUICClient or QUICServer is called, no? It's set in the first packet of the connection, and the caller can set Config.NextProtos as appropriate at that time.

The QUIC server know it, but the application doesn't know it. Consider the following application (pseudo) code:

tlsConf := &tls.Config{
     GetConfigForClient: func(info *tls.ClientHelloInfo) *tls.Config {
          if <QUIC v1> {
                 return tlsConfigWithNextProtosSetForQUICv1
          } else if <QUIC v2> {
                 return tlsConfigWithNextProtosSetForQUICv2
          }
          return nil
      }
}
quic.ListenAddr("0.0.0.0:443", tlsConf) // listen using a QUIC server that supports multiple QUIC versions

Any objection to add a context.Context to the Start method? I'd imagine there are more use cases where people would want to be able to associate a GetConfigForClient call with a specific connection attempt.

The error returned by WriteCryptoData needs to convey the TLS alert code.

It does so; when the error returned is an alert condition, WriteCryptoData returns an error wrapping tls.AlertError, which contains the code. The AlertError.Error contains a textual description of the error.

I've also updated WriteCryptoData to always return an AlertError (internal_error in the case of an uncategorized failure, usually resulting from a configuration error).

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate". When using crypto/tls on a TCP connection, I'd get the actual error, e.g. "x509: certificate is valid for localhost, not foo.bar" on the host that's rejecting the certificate. While you probably don't want to send this error to the peer in production (although you could, using the Reason Phrase of the CONNECTION_CLOSE frame), it's really hard to debug handshake failures if you don't have access to the error message on the peer that's rejecting the certificate.

neild commented 1 year ago

Any objection to add a context.Context to the Start method?

I'm reluctant to put a context.Context on Start, because a context usually carries a deadline. Using one without a deadline here seems confusing. And you can pass in whatever information is necessary to GetConfigForClient by closing over it.

I think the case you're describing with a varying ALPN protocol would be a feature of the QUIC implementation, not the TLS layer. For example:

quic.Listen("udp", ":443", &Config{
  GetConfigForClient: func(info *quic.ClientInfo) *quic.Config {
    if info.QUICv1 {
      // ...
    } else {
      // ...
    }
  },
}

Or possibly a more specialized feature which appends the -29 suffix when negotiating an older QUIC version:

quic.Listen("udp", ":443", &Config{
  TLSConfig: &TLSConfig{
    NextProtos: []string{"h3"},
  }
  // Indicates support for draft-ietf-quic-http-25.
  // When negotiating draft-ietf-quic-transport-25 QUIC, the ALPN protocol will be "h3" + "-25".
  DraftProtocolVersions: []int{25},
})

I wonder how necessary this is, however. A QUICv1 implementation can always advertise both "h3" and "h3-29" as its ALPN protocols, regardless of the QUIC protocol in use. HTTP/3 over QUICv2 uses the same APLN identifiers as over QUICv1. And ongoing support for pre-v1 QUIC seems of limited use.

marten-seemann commented 1 year ago

Any objection to add a context.Context to the Start method?

I'm reluctant to put a context.Context on Start, because a context usually carries a deadline. Using one without a deadline here seems confusing. And you can pass in whatever information is necessary to GetConfigForClient by closing over it.

I understand the objection, it's not the most intuitive use of a context.Context. I think it makes the most sense to think about this end to end. What would a user accessing the ClientHelloInfo.Context on the GetConfigForClient callback expect after calling quic.Dial(ctx, <server>)? I'd argue that they'd want to get back the ctx they used for dialing. This is the same behavior that they'd get from tls.Dialer.DialContext, and not having access to that context might make it hard to transition an application built on top of TLS/TCP to QUIC.

Since ClientHelloInfo is a struct (and not an interface), and Context is a method and not a field on this struct, I can't work around this in my QUIC stack by wrapping GetConfigForClient. I might be able to do some unsafe tricks to access the unexported ctx field on the struct, but I'd really like to avoid that...

rhysh commented 1 year ago

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC. Are there ways that TLS-for-TCP and TLS-for-QUIC have or would diverge in this area?

There's occasional mention of which encryption levels are lower or higher than others (in relation to when particular keys should be discarded). There are no current plans to support Early Data. The core type of EncryptionLevel is an int, which implies an ordering. Given that, does it make sense to reserve the number that EncryptionLevelEarlyData would use?

const (
    EncryptionLevelInitial = EncryptionLevel(iota)
    _                      // reserved for possible future support of EncryptionLevelEarlyData
    EncryptionLevelHandshake
    EncryptionLevelApplication
)

Which of the QUICTransport callbacks are required? For example, would HandshakeComplete: nil result in a panic, an error, or (in some cases) success? It may be helpful to clarify expectations before a future release adds another field, or for fields where there's asymmetry between client and server (like HandshakeComplete?).

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return? With CL 472621 PS 7, I don't see how to safely cancel an active handshake because Start both creates the closec and waits for the handshake to complete, and Close would need to race against it to use the closec. It seems like Start(context.Context) error would resolve any confusion and keep the caller in control of connection lifecycle.

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

marten-seemann commented 1 year ago

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC. Are there ways that TLS-for-TCP and TLS-for-QUIC have or would diverge in this area?

Yes and no. TLS 1.3 goes through the same encryption levels when performing the handshake, it just happens internally, so there was no need to expose these before.

Which of the QUICTransport callbacks are required? For example, would HandshakeComplete: nil result in a panic, an error, or (in some cases) success? It may be helpful to clarify expectations before a future release adds another field, or for fields where there's asymmetry between client and server (like HandshakeComplete?).

I can't see a way to build a QUIC stack that wouldn't use all of these, . By the way, HandshakeComplete is symmetric, both client and server complete the handshake at some point.

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return? With CL 472621 PS 7, I don't see how to safely cancel an active handshake because Start both creates the closec and waits for the handshake to complete, and Close would need to race against it to use the closec. It seems like Start(context.Context) error would resolve any confusion and keep the caller in control of connection lifecycle.

It's very subtle, and the code would probably benefit from a few more comments documenting how the three different channels in the QUICConn interact. Start returns after the ClientHello has been written (i.e. it does not block until handshake completion).

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

QUIC uses the Retry mechanism to protect against DoS attacks during the handshake. This happens entirely inside the QUIC stack, and before you even decrypt the client's Initial packet. All you need to do is parse the QUIC packet header. I'd argue that sending invalid ClientHellos is not a very interesting attack to begin with: For the attacker, it's pretty much the same cost to send a valid ClientHello, which then makes the server 1. generate and send a ServerHello, as well as the entire certificate chain and 2. leaves the server waiting for the client's second flight. Much more bang for the buck than sending an invalid ClientHello, which allows the server to abort the handshake early.

neild commented 1 year ago

I'm still not sold on this specific use for ClientHelloInfo.Context, but I think that you're right that there needs to be a way for the caller to specify what context ClientHelloInfo.Context derives from. In the usual non-QUIC case, this is the context passed to Conn.HandshakeContext, so it makes sense for QUICConn.Start to also take a context. I've made that change.

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate".

The AlertError.Error string is just the string representation of the error code, but the error returned by HandleCryptoData (which wraps AlertError) should contain the full text. This seems to be working for me, but perhaps I'm missing something?

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC.

As @marten-seemann says, the concept of encryption level is general to TLS, but the QUIC API is the only place that it needs to be exposed to the user.

Perhaps we should call this QUICEncryptionLevel to make it clearer that it's specific to the QUIC API.

Given that, does it make sense to reserve the number that EncryptionLevelEarlyData would use?

These values are just an enum; I don't think it's important to sort the early data level into any particular location. We could make the type something other than an int, but this is consistent with the use of integer enums elsewhere in crypto/tls.

Which of the QUICTransport callbacks are required?

Practically speaking, all of them; I don't think a functioning QUIC implementation can do without any. However, given that the reason for making QUICTransport a struct is to allow us to add optional additional callbacks in the future if needed, it's probably simpler to say that all the callbacks are optional. I've updated the CL to skip nil callbacks in all cases and document this behavior.

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return?

Start returns immediately after providing the initial flight of outgoing data (if any). It does not wait for the handshake to complete. I've updated the documentation to hopefully make this clearer.

Why this distinction between Start and Conn.Handshake? Because this makes interactions with QUICConn consistently non-blocking: Every QUICConn method does some work, possibly calling QUICTransport callbacks to report on the progress of the connection, and returns. The QUIC layer doesn't need to worry about asynchronously handling QUICTransport calls.

Close cleans up resources used by the connection, and should be called whenever the user is done with the QUICConn. In practical terms of the current implementation, it ensures the handshake goroutine is shut down if it was running. (But the presence of a handshake goroutine is an implementation detail that shouldn't be visible to the user outside of goroutine dumps.)

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

Questions of DoS defense for QUIC connections are going to be largely similar to those for regular TLS-over-TCP ones. In the TCP case, you have a goroutine in Conn.Handshake reading or writing from the underlying net.Conn. In the QUIC case, the goroutine is waiting for data provided by the QUIC layer. (Although as I mentioned above, the use of a goroutine to maintain handshake state in the QUIC case is an implementation detail.)

marten-seemann commented 1 year ago

I'm still not sold on this specific use for ClientHelloInfo.Context, but I think that you're right that there needs to be a way for the caller to specify what context ClientHelloInfo.Context derives from. In the usual non-QUIC case, this is the context passed to Conn.HandshakeContext, so it makes sense for QUICConn.Start to also take a context. I've made that change.

Thank you! It works :)

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate".

The AlertError.Error string is just the string representation of the error code, but the error returned by HandleCryptoData (which wraps AlertError) should contain the full text. This seems to be working for me, but perhaps I'm missing something?

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code. Assume that the server presents a certificate for the wrong name. On the client side, certificate verification will fail in Conn.verifyServerCertificate. In the error handling block, we

  1. call sendAlert with a alertBadCertificate, and
  2. return a CertificateVerificationError containing the certificate chain that was rejected.

sendAlert then sets the error on Conn.out.err. This is the error which is then returned from HandleCryptoData (after wrapping).

The CertificateVerificationError returned from Conn.verifyServerCertficate chain is bubbled through a number of functions before being saved on Conn.handshakeErr.

The error returned from HandleCryptoData would need to be a combination of the Conn.out error (so the QUIC stack can get the alert code to send in the CONNECTION_CLOSE frame) and the Conn.handshakeErr (so the QUIC stack can return a useful error to the caller of quic.Dial).

Does that make sense?

rhysh commented 1 year ago

Because this makes interactions with QUICConn consistently non-blocking: Every QUICConn method does some work, possibly calling QUICTransport callbacks to report on the progress of the connection, and returns.

Non-blocking sounds great! I saw the implementation of QUICConn.Start created a goroutine and then waited on a channel, and I didn't pull that thread long enough to see this for myself. Thank you for making it clear.

Questions of DoS defense for QUIC connections are going to be largely similar to those for regular TLS-over-TCP ones.

I think they're different in a key way: With TLS-over-TCP, the Go app doesn't even know about the connection (not yet returned by Accept) until the OS has finished the three-way handshake. This means that a Go server won't have any goroutines for the connection until the remote source of the packets has demonstrated at least some ability to receive packets the server sends to the claimed source address.

As a server operator, I'm interested in deploying QUIC in part due to its fast handshakes. I'd like if my servers didn't have to enable QUIC's Retry mechanism, since that slows down legitimate connections by an additional round trip. Making it cheap to hold on to in-progress handshakes means the choice of when to enable the Retry mechanism doesn't need to be based on costs in the crypto/tls stack.

That might depend on whether the role of Retry is to protect servers themselves (like SYN cookies for SYN floods), or if it's to reduce the damage in a reflect/amplify attack. As long as the API of QUICConn is non-blocking and its goroutine is an implementation detail, we probably don't need a full threat model here :)

neild commented 1 year ago

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code.

Ah, I see the problem. This will probably require that I look through the error handling paths through the entire handshake, so it'll take me a bit to fix.

Are there any remaining issues with the API design that I'm overlooking? (We're returning the wrong error in places, but that's a bug in the implementation; I think the principle of returning a wrapped AlertError works.)

neild commented 1 year ago

think they're different in a key way: With TLS-over-TCP, the Go app doesn't even know about the connection (not yet returned by Accept) until the OS has finished the three-way handshake. This means that a Go server won't have any goroutines for the connection until the remote source of the packets has demonstrated at least some ability to receive packets the server sends to the claimed source address.

This is a general issue for QUIC, in that the first packet received from a client can cause the server to do a non-trivial amount of work starting the TLS handshake.

The Retry mechanism permits a server to require that a client prove that it can receive packets on its source address before the server commits to starting the handshake. This essentially restores the behavior of the TCP three-way-handshake.

QUIC allows a server to provide clients with an address validation token for use in future connections. On reconnecting from the same address, a client provides a token previously received from the server which acts as proof of ownership of its address. (See RFC 9000, section 8.1.3). This can provide fast handshakes for repeat clients. This can further combine with 0-RTT/early data to enable single-round-trip request/responses.

marten-seemann commented 1 year ago

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code.

Ah, I see the problem. This will probably require that I look through the error handling paths through the entire handshake, so it'll take me a bit to fix.

I'm not sure if it's necessary to go through all the error handling paths. It seems like returning some way of accessing both the Conn.out.err and the Conn.handshakeErr would work. Returning a wrapped AlertError is where things get difficult. While I can do an errors.As to obtain the AlertError, I can't do the same for the other error (the one that's saved on Conn.handshakeErr), as it doesn't have a specific type. I could manually call Unwrap, and then find out which of the errors is not the AlertError, but that doesn't feel very clean. Maybe a more explicit approach of returning a type QUICError struct { Alert uint8; Err error } would be easier? Alternatively, the HandleCryptoData signature could be changed to return both the alert and the error: HandleCryptoData(level EncryptionLevel, data []byte) (AlertError, error), but I'm not sure if that's nicer.

Are there any remaining issues with the API design that I'm overlooking?

Other than the error handling, I can't see any problems with this API, and quic-go's (pretty comprehensive) test suite seems happy as well.

rsc commented 1 year ago

My understanding is that this is waiting on @FiloSottile to post a 0-RTT API that works for quic-go. Do I have that right?

neild commented 1 year ago

I believe that's right.

Minor unrelated API thought: There are some cases where RFC 9001 mandates an error condition that occurs within the TLS layer be reported as a connection error of type PROTOCOL_VIOLATION. For example, section 8.4 says a server SHOULD treat receiving a TLS ClientHello with a non-zero legacy_session_id field as a PROTOCOL_VIOLATION, but crypto/tls doesn't make this field visible to the user.

We should define a way for crypto/tls to report these conditions. It's a minor point (QUICHE reports this condition as a TLS illegal_parameter alert, not as PROTOCOL_VIOLATION, and really that's fine), but we may as well get it right. Perhaps we define a ErrQUICProtocolViolation and wrap it rather than (or in addition to?) an AlertError in these cases?

For other all other errors, I believe https://go.dev/cl/472621 now has QUICConn methods that consistently return an error which wrap an AlertError containing the alert number.

marten-seemann commented 1 year ago

Adding a bit of context here, this was added quite late during the standardization process, and the applicable error code was discussed on the PR. I'm not sure that ErrQUICProtocolViolation would be the right layering, it would be nice if the TLS layer didn't have to deal with QUIC error codes at all. Could we just return an error that's not an AlertError? The QUIC layer could then convert that into a PROTOCOL_VIOLATION.

FiloSottile commented 1 year ago

Sorry for the delay, @marten-seemann helped getting me up to speed and we've been working out how to get 0-RTT support in without infecting the rest of crypto/tls. We'll post that soon. (The delay is my fault, due to travel and conferences.)

In general I like the API. I'd like feedback on a slightly different shape for the interface, that follows the same semantics.

If you squint, you notice that all the QUICTransport callbacks (except GetTransportParameters) are actually Start and HandleCryptoData return values in disguise. That took me a while to figure out, but it's effectively confirmed by the fact that they "are called synchronously by Start and HandleCryptoData, never asynchronously in the background" and by the fact that they could all be called just before Start and HandleCryptoData return. Their order matters, but nothing needs to happen on the other side between calls.

That's a bit weird and confusing. What if we actually made them return values?

Here's a concrete example. I actually don't necessarily like the color of the bikeshed, and if anyone can think of a nicer way to encode the various ops in the type system, that would be great. (Maybe each Op is a struct that implements an interface? What's the method?) Anyway, I wanted to get feedback on the general direction before figuring out the details.

func (q *QUICConn) Start() ([]QUICOp, error)

func (q *QUICConn) HandleCryptoData(level EncryptionLevel, data []byte) ([]QUICOp, error)

type QUICOpKind uint16

const (
    QUICSetReadSecret QUICOpKind = iota
    QUICSetWriteSecret
    QUICWriteCryptoData
    QUICSetTransportParameters
    QUICHandshakeDone
)

type QUICOp struct {
    Kind QUICOpKind

    // The following values are set based on the Kind.
    Level EncryptionLevel
    Suite uint16
    Data []byte
}
neild commented 1 year ago

We'd need to start providing the outbound transport parameters up-front, although that's not difficult. Perhaps something like:

q := tls.QUICClient(&tls.QUICConfig{
  TransportParameters: p, // []byte containing the outbound transport parameters
  TLSConfig: tlsConfig,
})

Other than that, the current approach of a collection of callback functions vs. returning a data structure describing the events which have occurred are equivalent. I don't really see much ergonomic difference between the two. There's a fair bit of precedent in crypto/tls for using callbacks to interact with the user: TLSConfig has GetCertificate, GetClientCertificate, GetConfigForClient, VerifyPeerCertificate, and VerifyConnection. These do all return information, while the QUICTransport funcs mostly do not.

A possible approach might be to have more QUICConn methods that return information:

// WriteSecret returns the current write encryption level and secret.
// It returns an empty secret for the initial encryption level.
//
func (q *QUICConn) WriteSecret() (_ EncryptionLevel, secret []byte)

It might be a little bit clunky to poll the QUICConn for the current secrets and levels after every call to HandleCryptoData, but not that bad.

I don't feel particularly strongly about any of this, though. I expect there will be never be more than a low single-digit number of users of this API, so while it's important to make it difficult to misuse I don't think discoverability is especially important.

FiloSottile commented 1 year ago

These do all return information, while the QUICTransport funcs mostly do not.

Yeah, that's the source of my dislike of the QUICTransport callbacks, I think. The tls.Config ones return values that are used afterwards by the function that called them. The QUICTransport ones are just re-entrant and indirect ways to deliver return values.

I like tls.QUICConfig for TransportParameters, if nothing else because we might realize we want to add more stuff to it later, without adding it to tls.Config. Does it need its own GetConfigForClient though?

neild commented 1 year ago

QUICConfig probably needs its own GetConfigForClient, yeah. I can picture wanting to vary the transport parameters based on the ALPN protocol.

Alternatively, we could use the QUICConn as the config:

// SetTransportParameters sets the encoded QUIC transport parameters.
// For client connections, it must be called before Start.
// For server connections, it must be called before the GetConfigForClient callback (if any) returns.
// SetTransportParameters may be called multiple times. The last provided parameters take precedence.
func (q *QUICConn) SetTransportParameters(p []byte)
marten-seemann commented 1 year ago

I agree that it should be possible to set transport parameters based on the ALPN. It would also be really nice if a QUIC server was able to set transport parameters based on the transport parameters from the client.

I like tls.QUICConfig for TransportParameters, if nothing else because we might realize we want to add more stuff to it later, without adding it to tls.Config. Does it need its own GetConfigForClient though?

It sounds like we've renamed QUICTransport to QUICConfig now. This makes sense to, I didn't really like calling it a transport since the concept is very different from other transports (e.g. the HTTP one). I imagine that a single QUICConfig would only be used for a single QUIC connection though. Does that means we don't need a GetConfigForClient?

What about leaving the GetTransportParameters function on the QUICConfig as is? This callback would fit @FiloSottile's description of what a "proper" callback should be:

Yeah, that's the source of my dislike of the QUICTransport callbacks, I think. The tls.Config ones return values that are used afterwards by the function that called them.

neild commented 1 year ago

My mild preference is to stick with QUICTransport as currently defined; I'm not convinced that reshuffling things to provide CRYPTO bytes and secrets via return parameters rather than callbacks really makes things any simpler here.

I don't feel strongly about this, though.

FiloSottile commented 1 year ago

As someone that didn't follow the development of the API too closely, it took me a bit to get the semantics of the callbacks. The Transport name also doesn't help: in net/http parlance a Transport can handle multiple connections, while here it needs to be a set of closures over the QUIC-side state of the connection, right?