aws / s2n-quic

An implementation of the IETF QUIC protocol
https://crates.io/crates/s2n-quic
Apache License 2.0
1.13k stars 118 forks source link

s2n-quic server fails resumption interop test with neqo client #2192

Open goatgoose opened 4 months ago

goatgoose commented 4 months ago

Problem:

The resumption interop test has recently started failing with the neqo client and s2n-quic server. neqo has released a fix related to this issue (https://github.com/mozilla/neqo/pull/1837), but the interop test is still failing with neqo and s2n-quic.

Solution:

We should investigate the cause of this test failure. If the issue can be addressed in s2n-quic, we should resolve it and revert https://github.com/aws/s2n-quic/pull/2191 to enforce the resumption test with neqo and s2n-quic in CI.

mxinden commented 4 months ago

Sorry for the trouble here @goatgoose.

https://github.com/mozilla/neqo/pull/1837 fixed the issue.

The resumption testcase using neqo client s2n-quic server is no longer failing. See e.g. recent CI run:

https://github.com/mozilla/neqo/pull/1857#issuecomment-2077536804

The neqo-qns Docker image is published nightly, thus reverting https://github.com/aws/s2n-quic/pull/2191 should succeed now.

goatgoose commented 4 months ago

Hi @mxinden, it appears that even after the https://github.com/mozilla/neqo/pull/1837 fix, the neqo client and s2n-quic server still fail the resumption test. From https://github.com/mozilla/neqo/pull/1857#issuecomment-2077536804:

Failed Interop Tests
neqo-latest vs. s2n-quic: R A

However, looking at the interop runner, it seems like https://github.com/mozilla/neqo/pull/1837 fixed the issue for all implementations except for s2n-quic: resumption_interop

So we plan to investigate this to see if s2n-quic is causing this issue.

larseggert commented 1 month ago

I'm looking at the download of the second URL in https://interop.seemann.io/logs/2024-07-25T16:32/s2n-quic_neqo/resumption/output.txt.

One thing that looks odd from neqo's perspective is that we're receving HandshakeDone from the server many times after ACK'ing it in our packet 3.

Otherwise, I can't tell from the log why we wouldn't send a ConnectionClose. Is there any chance you can make a linux/arm64 docker image available? I unfortunately can't run amd64 locally.

WesleyRosenblum commented 1 month ago

One thing that looks odd from neqo's perspective is that we're receving HandshakeDone from the server many times after ACK'ing it in our packet 3.

This is expected. s2n-quic sends the HandshakeDone very aggressively (with every outgoing packet) until it has received acknowledgement it was received, so as to ensure the client is not blocked on the handshake.

Is there any chance you can make a linux/arm64 docker image available? I unfortunately can't run amd64 locally.

I'm currently having some trouble building on arm64 so this might take some time.

larseggert commented 2 weeks ago

This was a neqo bug: https://github.com/mozilla/neqo/pull/2067