docker-library / haproxy

Docker Official Image packaging for HAProxy
http://www.haproxy.org/
GNU General Public License v2.0
347 stars 158 forks source link

Memory exhaustion using haproxy image #194

Closed killianmuldoon closed 7 months ago

killianmuldoon commented 1 year ago

When starting the haproxy image it gets OOM-killed after using up all the memory on my system (32GB + 8GB Swap) almost immediately.

I'm running using the below command - where the config file is this one

docker run -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg

(Note I've set a 1GB memory limit on the above to demonstrate the problem so anyone trying to replicate doesn't get exhausted completely)

I've tested this behaviour on all versions back to the haproxy 2.2 image.

This issue can be resolved by setting ulimits as below:

docker run --ulimit nofile=8053:8053 -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg

Or by setting the connection limit e.g.

docker run -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg -n 1000000

The issue looks similar to the one investigated and closed in https://github.com/haproxy/haproxy/issues/1751

I'm wondering why the haproxy docker image might use up so much memory on startup if those limits aren't set and if this is just a docker issue, or related to the binary itself. I wasn't able to replicate this using the 2.4 version of haproxy on the same system.


SYSTEM INFORMATION

Docker version

docker version
Client: Docker Engine - Community
 Version:           20.10.18
 API version:       1.41
 Go version:        go1.18.6
 Git commit:        b40c2f6
 Built:             Thu Sep  8 23:12:30 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.18
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.6
  Git commit:       e42327a
  Built:            Thu Sep  8 23:10:10 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Fedora version

cat /etc/os-release
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation

Kernel version

5.19.12-200.fc36.x86_64
yosifkit commented 1 year ago

This sounds identical to https://github.com/docker-library/rabbitmq/issues/545. The cause is that fedora and other rpm-based distros set an astronomically large value for open files (1073741816 vs 65536). So, if you are running on a rpm-based OS that sets an extremely high open files limit, then you need to set --ulimit nofile= to a more reasonable value.

dlipovetsky commented 1 year ago

I think the root cause is HAProxy allocating resources for each connection, up to the maximum, and deriving that maximum (maxconn) from the (very high) kernel default file descriptor limit, which is the effective limit when the container runtime file limit is infinity.

If your platform only supports select and reports "select FAILED" on startup, you need to reduce maxconn until it works (slightly below 500 in general). If this value is not set, it will automatically be calculated based on the current file descriptors limit reported by the "ulimit -n" command, possibly reduced to a lower value if a memory limit is enforced, based on the buffer size, memory allocated to compression, SSL cache size, and use or not of SSL and the associated maxsslconn (which can also be automatic). -- https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#maxconn

lknite commented 1 year ago

Making a note for folks who end up here via kubernetes:

Looks like kubernetes relies on this to be fixed at the container service level, in my case this is containerd fixed like this:

# sed -i 's/LimitNOFILE=infinity/LimitNOFILE=65535/' /usr/lib/systemd/system/containerd.service
# systemctl daemon-reload
# systemctl restart containerd
# k delete deployment <asdf>