better TCP_NODELAY handling: only use it when it is useful

totaam commented 10 years ago

Follow up from #514: at present we enable TCP_NODELAY globally which is a bit wasteful.

It ensures that packets go out as soon as we queue them, but when the packets contain large-ish binary data this means that the binary data and the actual xpra packet structure are likely to travel in separate TCP-level packets.

It would be better to only enable TCP_NODELAY when aggregating packets is not helping: when we have no more data to send or when the output buffer is full. As per: Is there a way to flush a POSIX socket? and this answer: What I do is enable Nagle, write as many bytes (using non-blocking I/O) as I can to the socket (i.e. until I run out of bytes to send, or the send() call returns EWOULDBLOCK, whichever comes first), and then disable Nagle again. This seems to work well (i.e. I get low latency AND full-size packets where possible)

Good read: The Caveats of TCP_NODELAY

totaam commented 8 years ago

See also: #1211, #639, #999, #401, #540, #417

totaam commented 6 years ago

I'm seeing some weird behaviour with win32 clients trying to improve #999 and detecting late acks.

The network layer's source_has_more function may not be the right place to set and unset NODELAY because lots of small screen updates can take under a millisecond to compress, which is still slower than it takes the network layer to send them... Maybe we need to pass the screen update flush attribute down to the network layer too.

totaam commented 6 years ago

Done in r18149 + r18150. Implementation notes:

we add an attribute to the packet source functions so the sender can specify if there will be more packets coming (also used if we know the packet is not urgent and can be delayed by the network layer)
we no longer set NODELAY globally on startup, we change (if needed) for each packet
in the case of screen updates, we re-use the "flush" attribute (set to a non-zero value when we know that there are more screen update packets coming immediately after)
some other packets set the flush flag because they need to be sent in a timely manner (ie: ping) or because they affect the user experience (ie: lost-window)
TCP_SOCKET_NODELAY is now XPRA_SOCKET_NODELAY as it applies to more than just TCP sockets
we expose the "nodelay" attribute, it should always match the TCP sockopt
important bandwidth management updates in #999#comment:44

As of r18151, we can use XPRA_SOCKET_NODELAY to overrule the automatic settings:

XPRA_SOCKET_NODELAY=1 should be more or less equivalent to what we had before
XPRA_SOCKET_NODELAY=0 never sets NODELAY

TODO:

deal with other socket types - websockify does its own thing with a global flag, can we override it?
verify RFB and UDP still work after protocol layer changes (and proxy?)
maybe keep NODELAY off when there is congestion and we are batching? (not sure this is needed or helpful)
maybe audio packets need to be flushed quicker? (does this change cause audio latency to go up?)

@maxmylyn: this ticket is tailor made for the automated tests - we want to compare before and after to see if this helps, especially under bandwidth constrained conditions. (only slight problem is, there is a bug I'm working on which causes congestion detection to kick in too early, and the counter measures are too aggressive, causing the framerate to drop..)

totaam commented 6 years ago