aws / s2n-quic

An implementation of the IETF QUIC protocol
https://crates.io/crates/s2n-quic
Apache License 2.0
1.13k stars 118 forks source link

Invalid socket options on older kernels #2261

Open jkalez opened 2 months ago

jkalez commented 2 months ago

Problem:

The syscall to sendmmsg in s2n-quic-platform/src/syscall/mmsg.rs uses the msghdr.msg_control to set socket options for each call to sendmmsg. It seems that at least one of these options (level SOL_UDP, option UDP_SEGMENT) is not present on older kernels. Specifically, I have found that this option does not exist on Linux 4.14 and earlier.

The result is that s2n-quic believes GSO is supported, but the kernel does not support it. Running the io::tokio::tests::ipv4_test in s2n-quic-platform on 4.15 results in the kernel combining multiple UDP messages, forcing the test to execute until the timeout, then failing. This can be replicated by running cargo test in an Ubuntu 18.04 machine with a 4.15 kernel.

In all fairness, support for this option started appearing in 4.19, so I do think it's very fair to say that 4.15 is too old and isn't supported. The more general problem though is that there is no runtime detection for the socket options being used, which could cause pretty bad, silent bugs like this down the road. For example, it appears that GRO support didn't land until ~5.0. However, the UDP_GRO socket option may be used regardless of kernel support.

Solution:

Unfortunately, the sendmmsg and sendmsg (and their recv counterpart) syscalls do not appear to return any errors if passed an invalid socket option. An alternative may be to identify which socket options we'd like to use early on and attempt to set them with setsockopt. If we fail, then ensure we don't try to set those options, and turn off features which may rely on those options (e.g. GSO and/or GRO).

Requirements / Acceptance Criteria:

A handful of tests currently timeout on 4.15 kernels:

When these tests pass on older kernels, its likely that this issue is resolved.

As mentioned above, I think it's pretty fair to say you only support specific kernel versions & up. If that's the case though, that should probably be documented. It would be even better if when attempting to run on those kernels, we detect that and report an error or panic.

Out of scope:

mxinden commented 2 months ago

Potentially related: https://github.com/quic-go/quic-go/issues/4446

WesleyRosenblum commented 2 months ago

Thanks for the issue. We'll either document that we don't support older kernel versions or disable GSO for kernel versions older than 5.