apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.45k stars 1.31k forks source link

Evaluate TCP_NODELAY vs TCP_QUICKACK #2008

Open alexmiller-apple opened 5 years ago

alexmiller-apple commented 5 years ago

I have finally learned of the no delay vs quickack argument, and you can see this explanation to catch up to where I am now.

Our codebase currently sets TCP_NODELAY. It'd be worth running a few tests to see if we observe any difference in setting quickack instead (or both together). I've found some example code for how to set this through boost ASIO.

kaomakino commented 5 years ago

I did a quick test (Draft PR #2030) using fdbserver -r networktestserver with Alex's PR #2026 . In short, neither TCP_NODELAY nor TCP_QUICKACK makes a significant difference because there's some message coalescing happening.

fdbserver -r networktestserver -p xx.xx.xx.xx:xxxx --knob_flow_tcp_nodelay 0 --knob_flow_tcp_quickack 1 --knob_network_test_reply_size 1

responses per second: 237114.516347 (4.217371 us)
responses per second: 237746.542325 (4.206160 us)
responses per second: 237146.607591 (4.216801 us)

fdbservers are sending and receiving 230K messages / sec, but the OS sees only 7500 sendmsg (or recvmsg) / sec.

I haven't investigated to see where the coalescing happens as I don't think TCP_QUICKACK will buy us much anyways.

sfc-gh-almiller commented 3 years ago

It was pointed out to me by @sfc-gh-fangius that TCP_QUICKACK needs to be re-applied after every read from the socket, which is also confirmed in the manpage:

       TCP_QUICKACK (since Linux 2.4.4)
              Enable quickack mode if set or disable quickack mode if cleared.  In quickack
              mode, acks are sent immediately, rather than delayed if needed in  accordance
              to  normal  TCP  operation.   This  flag  is not permanent, it only enables a
              switch to or from quickack mode.  Subsequent operation of  the  TCP  protocol
              will once again enter/leave quickack mode depending on internal protocol pro‐
              cessing and factors such as delayed ack timeouts occurring and data transfer.
              This option should not be used in code intended to be portable.

So, #2026 is actually effectively a no-op, as it also was in the code that I pointed you to copy from 😭 . This extra syscall overhead might kill performance in the normal case enough that we wouldn't want to pursue this anyway.

However, there is a way to set TCP_QUICKACK at the kernel level instead, which would likely be the better way to go. Easiest way to do this is with ip route change ROUTE quickack 1.