Initially, we set the value to tcp_nodelay in this patch. But the
issue is sufficiently complex that I am punting to the nginx default
(on, as it happens).
"The real problem is ACK delays. The 200ms "ACK delay" timer is a
bad idea that someone at Berkeley stuck into BSD around 1985 because
they didn't really understand the problem. A delayed ACK is a bet
that there will be a reply from the application level within
200ms. TCP continues to use delayed ACKs even if it's losing that
bet every time."
Set tcp_nopush to on. This allows combination of the response
header and the beginning of a file being sent (fewer packets).
Enable gzip. At some CPU load, this compresses JavaScript, HTML,
etc. before sending it out.
Enable proxy_next_upstream_error. If there was an error, try the
next backend. But if the backend simply times out, then stop right
there. Otherwise, overly expensive queries will hop around, killing
one backend instance after the next.
Serve static files, such as thumbnails, directly. There is no need
for Tornado to have to server our JavaScript bundle. We use a query
string to expire the cache, and that still works correctly.
Set worked_processes to auto (scale to number of CPUs). This is
recommended by the Tornado docs, and should allow more concurrent
connections to be served. Hopefully particularly useful in the
context of static files now being served directly.
Set worker connections to 1024. The default is 512, and now it is
I changed this because a) we ran out of worker connections
and b) this is recommended in the Tornado docs.
Use epoll. This is a more effective connection processing method
available on Linux.
Initially, we set the value to
tcp_nodelay
in this patch. But the issue is sufficiently complex that I am punting to the nginx default (on
, as it happens).As John Nagle, the author of the Nagle algorithm, says at https://news.ycombinator.com/item?id=9048947:
"The real problem is ACK delays. The 200ms "ACK delay" timer is a bad idea that someone at Berkeley stuck into BSD around 1985 because they didn't really understand the problem. A delayed ACK is a bet that there will be a reply from the application level within 200ms. TCP continues to use delayed ACKs even if it's losing that bet every time."
A detailed explanation of the issue is given at https://eklitzke.org/the-caveats-of-tcp-nodelay
Set
tcp_nopush
to on. This allows combination of the response header and the beginning of a file being sent (fewer packets).Enable
gzip
. At some CPU load, this compresses JavaScript, HTML, etc. before sending it out.Enable
proxy_next_upstream_error
. If there was an error, try the next backend. But if the backend simply times out, then stop right there. Otherwise, overly expensive queries will hop around, killing one backend instance after the next.Serve static files, such as thumbnails, directly. There is no need for Tornado to have to server our JavaScript bundle. We use a query string to expire the cache, and that still works correctly.
Set
worked_processes
to auto (scale to number of CPUs). This is recommended by the Tornado docs, and should allow more concurrent connections to be served. Hopefully particularly useful in the context of static files now being served directly.Set worker connections to 1024. The default is 512, and now it is
Use
epoll
. This is a more effective connection processing method available on Linux.Closes #51