Open abbradar opened 5 years ago
Hi Nikolay,
Thanks so much for getting to the bottom of the issue and submitting the PR. There are indeed places in our TCP stack where the assumption of before_eq(send_ack, send_seq)
is broken by the extension API.
We've looked at your PR, like it very much and are in the process of implementing a fix for the problem in our internal software repo.
Again, thanks for choosing our product and we greatly appreciate your ongoing contributions.
Best Regards, Dave
We have discovered a possible race condition between
exasock_tcp_send_advance
and ACKs to remote hosts' packets. Consider this case:exasock_tcp_send_advance
is called.In this case remote host receives an "impossible ACK": under no normal circumstances SEQ in packet B can be lesser than SEQ in packet A, yet because kernel module and a userspace application sending packets run in different threads this can theoretically happen. We have observed this in real setting because we have random delays possible between sending packets via libexanic and calling
exasock_tcp_send_advance
.A different vendor, Solarflare, handles this by deliberately setting SEQ value in empty ACKs to a value from the future, namely
send_seq + min(rwnd_len, cwnd_len, mss)
(a bit more complicated than that but you get the picture). This way technically those ACKs are always correct and just appear severely out of order. An immediate downside of this solution is that traffic sent this way appears severely broken to various analysis tools like Wireshark, and for a good reason so.Is this race condition dangerous in the wild? Do you have any data on how do various TCP stacks handle "impossible ACKs"? Are there any other solutions to this problem that you see besides the one proposed? We have a patch that implements it in case you wish to experiment but because of the downsides above obviously it's not fit for mainline as is.