cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.07k stars 603 forks source link

delayed ack should not delay the ack to the first data packet? #887

Open nyh opened 7 years ago

nyh commented 7 years ago

While tst-tcp-cork works as I expected on OSv: The TCP_CORK and TCP_NODELAY options avoid the Nagle algorithm and delay associated with its interaction with delayed ack, without these options we do see those delays. The strange thing is that when running this test on Linux, we don't seem them. Something in Nagle, delayed-ack, or their interaction, is different on Linux...

I suspect that Linux modified delayed-ack somehow - for example Looking at Linux's net/ipv4/tcp_input.c, tcp_event_data_recv(), I see that the first data packet is always acknowledged immediately, not delayed. If this is the case, Nagle would not come into action, and send the next packet and we'll see a nice ping-pong of packets and acks.

We see exactly the same problem in real tests involving nginx (which apparently does not use TCP_CORK or TCP_NODELAY explicitly?) which randomly hang for 100ms on OSv after the first packet of the connection, but never anywhere else, so it is likely the same problem.

Also since tst-tcp-cork without TCP_CORK now "fails" on Linux (it fails to reproduce the 100ms delay) we need to modify this test, because if we fix OSv to behave like Linux, it also won't show anything different with TCP_CORK or TCP_NODELAY. Testing TCP_CORK is fairly easy - instead of measuring time between accept() and receiving 200 bytes, we should measure the time between receiving 1 byte, and the full 200 bytes. Without TCP_CORK it should be 5ms (the delay we have between the writes), with TCP_CORK it should be 0ms (everything sent together). Plus leave the existing time measurement which should remain 5ms (everything else would be a bug).

nyh commented 7 years ago

Doing some detective work on ancient trees of Linux source code, I see that in 1998, "DaveM" added a special case for delayed ack that the first data packet get a special timeout of "2" (I assume that means 20ms)? Later that year Andrea Arcangeli sent a patch to "Fix TCP delayed ACK stall" which among other things set the delayed ack timeout on the first data packet in a new function tcp_enter_quickack_mode() setting the timeout to HZ/100, i.e., 10ms. In later years this function was renamed and instead an ack was sent immediately.

I think we need to have to modify our delayed ack behavior on the first packet too, if only to have buggy software like nginx behave like on Linux. Non-buggy software should use TCP_CORK or TCP_NODELAY explicitly, and not rely on magic algorithms from the kernel to guess the right behavior for them :-(

nyh commented 7 years ago

By the way, DaveM's comment on why he added that special timeout of "2" for the first ack was /* Help sender leave slow start quickly */ A related comment from DaveM is left in today's Linux source code, tcp_input.cc:

/* There is something which you must keep in mind when you analyze the
 * behavior of the tp->ato delayed ack timeout interval.  When a
 * connection starts up, we want to ack as quickly as possible.  The
 * problem is that "good" TCP's do slow start at the beginning of data
 * transmission.  The means that until we send the first few ACK's the
 * sender will sit on his end and only queue most of his data, because
 * he can only send snd_cwnd unacked packets at any given time.  For
 * each ACK we send, he increments snd_cwnd and transmits more of his
 * queue.  -DaveM
 */

The current code actually seems to have elaborate logic (I gave up my detective work understand when exactly it was added) to have a number of "quickacks", not just one - it seems the connection starts with two of these allotted, and in special events like a sender appearing to be blocked by the window size, it gives more "quickacks". I would really not like to reproduce this complex logic in OSv if it's not already there :-( I hope that something simple could at least fix the NGINX case...

wkozaczuk commented 6 years ago

I just saw the related unit test tst-tcp-cork.so fail on Ubuntu 16 without KVM acceleration:

  TEST tst-tcp-cork.so                    warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
OSv v0.24-540-g603faba
eth0: 192.168.122.15
test 0 0 time (ms): 108
PASS: Without TCP_CORK, delayed ack comes into play
test 1 0 time (ms): 6
PASS: With TCP_CORK, no nagle so no delayed ack
test 0 1 time (ms): 275
FAIL: With TCP_NODELAY, no nagle so no delayed ack
SUMMARY: 3 tests, 1 failures
Test tst-tcp-cork.so FAILED
Traceback (most recent call last):
  File "/home/wkozaczuk/projects/osv/scripts/test.py", line 187, in <module>
    main()
  File "/home/wkozaczuk/projects/osv/scripts/test.py", line 174, in main
    run_tests()
  File "/home/wkozaczuk/projects/osv/scripts/test.py", line 165, in run_tests
    run(tests_to_run)
  File "/home/wkozaczuk/projects/osv/scripts/test.py", line 85, in run
    run_test(test)
  File "/home/wkozaczuk/projects/osv/scripts/test.py", line 60, in run_test
    test.run()
  File "/home/wkozaczuk/projects/osv/scripts/tests/testing.py", line 29, in run
    run_command_in_guest(self.command).join()
  File "/home/wkozaczuk/projects/osv/scripts/tests/testing.py", line 163, in join
    raise Exception('Guest failed')
Exception: Guest failed