envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.99k stars 4.81k forks source link

Customizing backlog size #9501

Closed rfkm closed 4 years ago

rfkm commented 4 years ago

There seems to be no way to customize backlog size of a socket and it is fixed to 128. https://github.com/envoyproxy/envoy/blob/7ff7cb4c6a1dd62e43ad9aeaee98bb103971ba6a/source/common/network/listener_impl.cc#L52 https://github.com/libevent/libevent/blob/master/listener.c#L180

It would be nice if you can provide an option to change it.

argos83 commented 4 years ago

Hi @mattklein123, we've been investigating some issues one of our high throughput services was having, and found the cause to be this low fixed backlog size. A particular problem with this is, when the queue is full and new connections are rejected, envoy does not seem to be aware of the problem and no error metrics are emitted. So we were flying blind and had to rely on OS stats to find the cause.

Unfortunately this could be a stopper for us to continue the rollout of envoy to our higher tier services. My team is not very familiarized with the envoy codebase yet, but we could try to fix this and raise a PR so it hopefully makes it in time to be included in the 1.14.0 release.

It seems, the only fix needed here is a user setting to make its way to the evconnlistener_new's backlog argument, keeping -1 as default for backwards compatibility.

Also, to avoid back and forth with the PR:

mattklein123 commented 4 years ago

cc @paulnivin who has a partially done patch for this that maybe you can finish. Paul can you post what you have?

paulnivin commented 4 years ago

https://github.com/paulnivin/envoy/tree/listen_somaxconn is the branch I had been working on, but I haven't had cycles of late to finish it up and add tests. Diff @ https://github.com/paulnivin/envoy/compare/master...paulnivin:listen_somaxconn?expand=1

My initial approach was to just have envoy use the OS tunable backlog (easier to implement and solves the high throughput case) and a later version, if needed, could plumb through an application specific setting/override for the backlog.

In a pinch, for testing or urgent situations, it's possible to use LD_PRELOAD to override the backlog without any source changes or recompiling (e.g. https://access.redhat.com/solutions/3314151).

surki commented 4 years ago

@paulnivin wondering if you are planning to rollout this? Currently using LD_PRELOAD, just trying to get out of it.

mattklein123 commented 4 years ago

@surki sorry this has been on my list to finish for a while. I will try to finish it soon.

mattklein123 commented 4 years ago

cc @florincoras who I think that volunteered (been volunteered?) to fix this. @florincoras I would add a config option on the Listener proto object which I think should be pretty easy to plumb through to where you need it.

If unset, I would probably do what @paulnivin did here: https://github.com/paulnivin/envoy/compare/master...paulnivin:listen_somaxconn?expand=1 which is to make it -1 on linux to use the kernel default, otherwise allow it to be explicitly configurable.

florincoras commented 4 years ago

@mattklein123 :-) Will do once we're done with #12547!

mattklein123 commented 4 years ago

Fixed by https://github.com/envoyproxy/envoy/pull/12625