NagiosEnterprises / nsca

NSCA Passive Check Daemon
GNU General Public License v2.0
41 stars 25 forks source link

nsca server: sockets in state CLOSE_WAIT accumulate #56

Open sfr-financial-com opened 1 month ago

sfr-financial-com commented 1 month ago

Hi everyone,

we're running nsca 2.10.2 on Debian 12 and notice a slow, but steady buildup of sockets in state CLOSE_WAIT:

# netstat -tnp | grep nsca
tcp6       1      0 10.1.0.25:5667          10.1.1.78:35280         CLOSE_WAIT  727999/nsca         
tcp6     721      0 10.1.0.25:5667          10.1.0.31:34798         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.1.68:58930         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.1.41:38798         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.1.41:60050         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.27:52174         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.11:57338         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.23:60520         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.8:53028          CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.1.78:33568         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.24:52466         CLOSE_WAIT  727999/nsca         
tcp6     720      0 10.1.0.25:5667          10.1.1.78:40508         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.0.23:50380         CLOSE_WAIT  727999/nsca         
tcp6       1      0 10.1.0.25:5667          10.1.1.41:54152         CLOSE_WAIT  727999/nsca         
tcp6     721      0 10.1.0.25:5667          10.1.1.68:37230         CLOSE_WAIT  727999/nsca    

These sockets persist and are never closed. I have debug logs for the connection that involved 10.1.0.11:57338:

Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.0.11 port 57338
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.0.11 port 57350
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.39 port 41782
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.39 port 41798
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.39 port 41814
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-9
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-9', Service Description: 'feeds_r_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLogging_lines'=>
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.41 port 40848
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.41 port 40862
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:03 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.41 port 40874
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-9
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-9', Service Description: 'tsdb_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLogging_lines'=1 '>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.41 port 40880
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-9
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-9', Service Description: 'eta_snapper_check_logs', Return Code: '3', Output: 'UNKNOWN - (4 unknown) - could not find logfile />
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-fb-2
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-fb-2', Service Description: 'delayservice_es_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLogg>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-9
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-9', Service Description: 'derivativeserver_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLoggin>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-fb-1
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-fb-1', Service Description: 'feedproxy_selectf_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLo>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-fb-1
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-fb-1', Service Description: 'delayservice_es_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLogg>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Time difference in packet: 0 seconds for host muc1sta-be-fb-1
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: SERVICE CHECK -> Host Name: 'muc1sta-be-fb-1', Service Description: 'feedproxy_wiesn_check_logs', Return Code: '0', Output: 'OK - no errors or warnings|'excessiveLogg>
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Attempting to write to nagios command pipe
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Connection from 10.1.1.39 port 41830
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: Handling the connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...
Aug 12 08:50:04 muc1sta-infra-1 nsca[727999]: End of connection...

Unfortunately, since nsca doesn't give us an identifier (e.g. remote port) in log entries, we can't really tell you which connection is what. However, we sent 5 results to the server, and only four ever made it to the CMD socket, one leaving behind the above socket in state CLOSE_WAIT.

sfr-financial-com commented 1 month ago

Previously, running 2.9.2, this issue didn't happen.

EDIT: OK, my fault, 2.9.2 was running in --daemon mode, while 2.10.2 on Debian defaults to single process. When switching the new version to multi process mode, the problems don't occur anymore.