driskell / log-courier

The Log Courier Suite is a set of lightweight tools created to ship and process log files speedily and securely, with low resource usage, to Elasticsearch or Logstash instances.
Other
419 stars 107 forks source link

log-courier admin socket stuck #394

Closed sysmonk closed 2 years ago

sysmonk commented 2 years ago

Hi,

We're running lc-admin and log-courier version 2.9.0 and we're seeing that the admin interface sometimes gets stuck. In example:

# lc-admin -connect tcp:127.0.0.1:12345
Admin version 2.9.0

Setting up client for tcp:127.0.0.1:12345...

There is no response after, and the process is just stuck.

We can, though, connect to the socket:

# nc -v 127.0.0.1 12345
localhost.localdomain [127.0.0.1] 12345 (?) open
GET / HTTP/1.1

HTTP/1.1 400 Bad Request: missing required Host header
Content-Type: text/plain; charset=utf-8
Connection: close

But once the Host header is sent, it's also stuck:

# nc -v 127.0.0.1 12345
localhost.localdomain [127.0.0.1] 12345 (?) open
GET / HTTP/1.1
Host: 127.0.0.1:12345

The log-courier itself does seem to be working ( reading and delivering logs ). Tried changing the log level to debug, but nothing interesting there.

What information can we provide to help with debugging this?

sysmonk commented 2 years ago

I've sent a SIGQUIT signal to log-courier to generate some debug information, but the log-courier STDERR/STDOUT is sent to /dev/null, but i worked around it by grabbing it from strace output and parsing it a bit, so the output might be a bit corrupted, but i hope it can give some useful information. stuck-2022-10-14.txt

driskell commented 2 years ago

Thanks that dump should help isolate what’s happening - can see admin server stuck waiting for something - probably it’s some old metric collection code still using channels.