NagiosEnterprises / nrpe

NRPE Agent
GNU General Public License v2.0
257 stars 133 forks source link

nrpe[xxxx]: ERROR: my_system() write(fd, buffer)-2 failed... #271

Open yolanjoy opened 1 year ago

yolanjoy commented 1 year ago

Hello,

I found this error repeating recently. Looking at the "messages" it seemed occurred on various days in the recent past. It is not happening continuously but at different times.

RHEL 7.6 NRPE client and check_nrpe plugin running as a "systemd" service - v4.0.3 on remote systems

I enabled the "debug" and isolated it to a specific command where the output is over 6K DB records. Perhaps the output is higher than 64K buffer size?

If you need any other information, let me know.

Thanks

ericloyd commented 1 year ago

We're getting this error as well, but on nothing so huge. We're simply checking the following:

check_cpu_stats
check_disk
check_init_service
check_load
check_mem
check_open_files
check_procs
check_services
check_swap
check_users

Haven't bothered to see if we can trace it to a specific command though. Here's the fun part, we see them in a weird pattern that doesn't make much sense to me, because these services are all on the same check_interval (via template). Which implies to me that there's a memory leak somewhere that gets worse over time and then suddenly it resets (possibly when an OOM killer comes along?). Again, I haven't done any real troubleshooting.

But this NLS graph of incidents is pretty cool. This is the past 7 days. Odder still is that of the two cloned machines that are prod and QA (prod therefore gets 5-10 times as much use as QA), only the prod box seems to be affected.

image

ericloyd commented 1 year ago

Quick look at /var/log/messages (ancient CentOS 6.4 machine) shows this as a repeated pattern:

Feb  3 08:56:23 hostname nrpe[17424]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:23 hostname nrpe[17424]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:23 hostname nrpe[17424]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:23 hostname nrpe[17424]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:38 hostname nrpe[17458]: Error: (use_ssl == true): Request packet version was invalid!
Feb  3 08:56:38 hostname nrpe[17458]: Could not read request from client , bailing out...
Feb  3 08:56:38 hostname nrpe[17458]: INFO: SSL Socket Shutdown.
Feb  3 08:56:47 hostname nrpe[17480]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:47 hostname nrpe[17480]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:47 hostname nrpe[17480]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:47 hostname nrpe[17480]: ERROR: my_system() write(fd, buffer)-2 failed...
Feb  3 08:56:47 hostname nrpe[17480]: ERROR: my_system() write(fd, buffer)-2 failed...