Closed rfkm closed 4 years ago
Hi @mattklein123, we've been investigating some issues one of our high throughput services was having, and found the cause to be this low fixed backlog size. A particular problem with this is, when the queue is full and new connections are rejected, envoy does not seem to be aware of the problem and no error metrics are emitted. So we were flying blind and had to rely on OS stats to find the cause.
Unfortunately this could be a stopper for us to continue the rollout of envoy to our higher tier services. My team is not very familiarized with the envoy codebase yet, but we could try to fix this and raise a PR so it hopefully makes it in time to be included in the 1.14.0 release.
It seems, the only fix needed here is a user setting to make its way to the evconnlistener_new
's backlog
argument, keeping -1 as default for backwards compatibility.
Also, to avoid back and forth with the PR:
listener.proto
? do you think tcp_backlog_size
would be a good name for it?cc @paulnivin who has a partially done patch for this that maybe you can finish. Paul can you post what you have?
https://github.com/paulnivin/envoy/tree/listen_somaxconn is the branch I had been working on, but I haven't had cycles of late to finish it up and add tests. Diff @ https://github.com/paulnivin/envoy/compare/master...paulnivin:listen_somaxconn?expand=1
My initial approach was to just have envoy use the OS tunable backlog (easier to implement and solves the high throughput case) and a later version, if needed, could plumb through an application specific setting/override for the backlog.
In a pinch, for testing or urgent situations, it's possible to use LD_PRELOAD to override the backlog without any source changes or recompiling (e.g. https://access.redhat.com/solutions/3314151).
@paulnivin wondering if you are planning to rollout this? Currently using LD_PRELOAD, just trying to get out of it.
@surki sorry this has been on my list to finish for a while. I will try to finish it soon.
cc @florincoras who I think that volunteered (been volunteered?) to fix this. @florincoras I would add a config option on the Listener proto object which I think should be pretty easy to plumb through to where you need it.
If unset, I would probably do what @paulnivin did here: https://github.com/paulnivin/envoy/compare/master...paulnivin:listen_somaxconn?expand=1 which is to make it -1 on linux to use the kernel default, otherwise allow it to be explicitly configurable.
@mattklein123 :-) Will do once we're done with #12547!
There seems to be no way to customize backlog size of a socket and it is fixed to 128. https://github.com/envoyproxy/envoy/blob/7ff7cb4c6a1dd62e43ad9aeaee98bb103971ba6a/source/common/network/listener_impl.cc#L52 https://github.com/libevent/libevent/blob/master/listener.c#L180
It would be nice if you can provide an option to change it.