centreon / centreon-nsclient-build

Source use to build the centreon NSClient agent
19 stars 3 forks source link

centreon_plugins.exe can hang forever #28

Open UrBnW opened 3 years ago

UrBnW commented 3 years ago

Hi,

centreon_plugins.exe, called by NSClient++, can hang forever, accumulating processes and memory consumption on the monitored Windows machine.

Below is the faulty behaviour we discovered with an unpatched Net::NTP (see #25 and https://github.com/centreon/centreon-plugins/issues/2129). But seems to be a more general issue, as I tend to demonstrate below.

With following nsclient.ini configuration :

[/settings/external scripts]
timeout=10

And using centreon_plugins.pl --plugin=apps::protocols::nrpe::plugin --custommode=nsclient --new-api ... as client (should be the case with other clients too).

When launching the centreon_plugins.pl command, we see 2 new centreon_plugins.exe processes appearing :

1

10 seconds later, the centreon_plugins.pl returns with the following : Command check_centreon_plugins didn't terminate within the timeout period 10s Additional --debug gives : {"command":"check_centreon_plugins","lines":[{"message":"Command check_centreon_plugins didn't terminate within the timeout period 10s","perf":{}}],"result":3}

And on Windows side, one of the 2 processes disappears, for sure killed by NSClient++ :

2

Unfortunately, as you can see, the bigger one remains, and it hangs forever. So seems like NSclient++ does its job killing the external command, perhaps there's an issue with signal handling / forwarding in centreon_plugins.exe itself.

Even with a dummy sleep in the plugin's code as a really simple test case, the second process does not get killed.

Thx 👍

garnier-quentin commented 3 years ago

Maybe it's more like a par::packer issue in fact

UrBnW commented 3 years ago

So, finally, after some investigation, I see one solution to solve this issue, here it is, for reference.

It should be done at NSClient++ level : https://github.com/mickem/nscp/blob/0.5.2.41/include/process/execute_process_w32.cpp#L246 Instead of TerminateProcess(pi.hProcess, ..., and then instead of working on the first / parent process only, the whole process tree should rather be proceeded. Some guideline here : https://stackoverflow.com/questions/1173342/terminate-a-process-tree-c-for-windows

I was also thinking about a solution at PAR::Packer level, catching signals (thanks to the signal() function) and forwarding them to the spawned process : https://github.com/rschupp/PAR-Packer/blob/1.051/myldr/boot.c#L275 But as per TerminateProcess() definition and documentation, I'm pretty sure handling signals like this won't work here.

UrBnW commented 3 years ago

Issue opened at NSClient++ level : https://github.com/mickem/nscp/issues/712 Please upvote there so we have a chance for it to be considered.