docker-library / haproxy

Docker Official Image packaging for HAProxy
http://www.haproxy.org/
GNU General Public License v2.0
346 stars 158 forks source link

HAProxy crashing on start, thread 2 is about to kill the process. #224

Open ahuston-0 opened 7 months ago

ahuston-0 commented 7 months ago

Not sure if I should file this here or with the upstream, but my haproxy setup was working fine for months until a few days ago. I'm not sure what changed, but now it crashes with "Thread 2 is about to kill the process" and then no traffic goes through.

Environment Details OS: NixOS 23.11 Docker 24.0.5 Storage is OverlayFS2 on ZFS 2.2.2

HAProxy config

global
 # log stdout format raw local0 info
  log stdout format raw local0
  crt-base /etc/ssl/certs/

defaults
  log global
  mode http
  timeout client 2000m
  timeout connect 200s
  timeout server 2000m
  timeout http-request 2000m

#Application Setup
frontend ContentSwitching
  bind *:80
 # bind *:443 ssl crt /etc/ssl/certs/cloudflare.pem
  bind *:443 ssl crt /etc/ssl/certs/origin_ca_ecc_root_new.pem
  mode  http
  option httplog

  # Front-end acess control list
  acl host_glances hdr(host) -i monit.mydomain.xyz
  acl host_glances hdr(host) -i glances.mydomain.xyz
  # Backend-forwarding
  use_backend glances_nodes if host_glances

backend glances_nodes
  mode http
  server server glances:61208

Docker Compose file

services:
  haproxy:
    privileged: false
    restart: always
    image:latest
     volumes:
      - ./haproxy:/usr/local/etc/haproxy:ro
      - ./certs:/etc/ssl/certs:ro
    ports:
      - 9080:80
      - 9443:443
      - 8404:8404
      - 25565:25565
    environment:
      - PUID=997
      - PGID=100
    networks:
      - web
      - default
networks:
  web:
    name: haproxy-net
Crash logs haproxy-1 | [NOTICE] (1) : New worker (8) forked haproxy-1 | [NOTICE] (1) : Loading success. haproxy-1 | Thread 2 is about to kill the process. haproxy-1 | Thread 1 : id=0x7fc626c57100 act=1 glob=0 wq=0 rq=1 tl=0 tlsz=0 rqsz=1 haproxy-1 | 1/1 stuck=0 prof=0 harmless=0 isolated=0 haproxy-1 | cpu_ns: poll=0 now=6035627479 diff=6035627479 haproxy-1 | curr_task=0 haproxy-1 | *>Thread 2 : id=0x7fc626c4b700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0 haproxy-1 | 1/2 stuck=1 prof=0 harmless=0 isolated=0 haproxy-1 | cpu_ns: poll=97660 now=2001614429 diff=2001516769 haproxy-1 | curr_task=0 haproxy-1 | call trace(10): haproxy-1 | | 0x564a2f23d661 [eb ba 66 66 2e 0f 1f 84]: ha_thread_dump+0x91/0x93 haproxy-1 | | 0x564a2f23d8a1 [64 48 8b 53 10 64 48 8b]: ha_panic+0x111/0x3f4 haproxy-1 | | 0x7fc626f86140 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x13140 haproxy-1 | | 0x564a2f284282 [48 83 7a 28 00 74 36 48]: fd_reregister_all+0x32/0xb1 haproxy-1 | | 0x564a2f0b5709 [b8 01 00 00 00 5b 5d 41]: main+0x4ec9 haproxy-1 | | 0x564a2f21345e [85 c0 0f 84 5a 03 00 00]: main+0x162c1e haproxy-1 | | 0x7fc626f7aea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7 haproxy-1 | | 0x7fc626e9aa2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a
Darlelet commented 6 months ago

If you still encounter the issue please file an issue on HAProxy directly (https://github.com/haproxy/haproxy/issues)

"Thread X is about to kill the process" means haproxy watchdog noticed that one thread has become unresponsive and to prevent further issues the watchdog decided to abort the process.

Since you're deploying using the :latest tag, it's very likely that you are hitting a bug or limitation which only happens on haproxy 2.9 (which was just released, see https://github.com/docker-library/haproxy/commit/81e9df259751f1ec391f5c45905c198145e1cc0f) and didn't show up with the previous version (2.8)

ahuston-0 commented 6 months ago

For reference, I've managed to get a copy of my haproxy config running with the nixos haproxy service, so the issue is isolated to docker.

ahuston-0 commented 6 months ago

If you still encounter the issue please file an issue on HAProxy directly (https://github.com/haproxy/haproxy/issues)

"Thread X is about to kill the process" means haproxy watchdog noticed that one thread has become unresponsive and to prevent further issues the watchdog decided to abort the process.

Since you're deploying using the :latest tag, it's very likely that you are hitting a bug or limitation which only happens on haproxy 2.9 (which was just released, see https://github.com/docker-library/haproxy/commit/81e9df259751f1ec391f5c45905c198145e1cc0f) and didn't show up with the previous version (2.8)

I think I've seen the same on haproxy lts but let me file a bug with upstream. Thanks

Darlelet commented 5 months ago

Any news about this issue? Do you still encounter the crash with :latest or :2.9.2 tag which where some high cpu usage related bugs were addressed?

Thanks

ahuston-0 commented 5 months ago

Hi, I've upgraded to the latest version and will get back in a few hours. I did discover though that the original issue I was having that was causing these logs is some memory/FD bug. If I dont add a nofile limit to the container it was just consuming like 200GB of memory and then crashing (this is on a server with ~256GB of memory available).

ahuston-0 commented 5 months ago

This is what I added

    ulimits:
      nofile:
        soft: 1024
        hard: 4096
Darlelet commented 5 months ago

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see https://github.com/haproxy/haproxy/issues/2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

ahuston-0 commented 5 months ago

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see https://github.com/haproxy/haproxy/issues/2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

It was happening over the span of like a minute or two. I can probably time it and get back. I'll check out those two settings and see if they help.

Regarding the CPU utilization issue, I did the upgrade last night and 12 hours in we're at 0.6% CPU so I think that one might have worked.

ahuston-0 commented 5 months ago

Does it instantly ramp up to 200GB of used memory or is it slowing getting there, which would suggest a leak somewhere?

In the first case, maybe this is not related to haproxy itself but to docker engine update (which removed or increased an existing limit), see haproxy/haproxy#2043. If that's the case, you could also mitigate using maxconn or fd-hard-limit global parameters in haproxy config file.

It looks like this ended up working. I removed the ulimits settings in docker-compose.yml and replaced it with maxconn 60000 in haproxy.cfg and now theres no more high memory utlization and crashing. Then for the other issue having upgraded to 2.9.2 seems to have fixed the high CPU utilization.

Darlelet commented 5 months ago

Great news, thanks for sharing your positive results with us.

LaurentGoderre commented 2 months ago

Can we close this as solved?