NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 94 forks source link

NCPA-3-0-0 | Windows Passive logging failure #1030

Closed SNapier closed 6 months ago

SNapier commented 10 months ago

NCPA is not sending passive checks to XI via NRDP.

From the Passive log post fresh install on Win10

2023-11-17 09:20:55,678 passive ERROR Stdout or returncode was None, cannot return meaningfully. Traceback (most recent call last): File "ncpa.py", line 319, in run_all_handlers File "passive\nrdp.py", line 113, in run File "passive\nrdp.py", line 95, in get_xml_of_checkresults File "passive\nrdp.py", line 54, in make_xml File "passive\ncpacheck.py", line 84, in run ValueError: Stdout or returncode was None, cannot return meaningfully.

SNapier commented 10 months ago

Reboot of target machine and checks have come into XI image

Logs on agent do not reflect checks executed still only has the one entry.

rob2791 commented 10 months ago

I have the same problem. Here is example of config and what the ncpa_passive.log says. image image

ne-bbahn commented 9 months ago

This error occurs when your plugin/endpoint isn't responding properly. I typically occurs when you are either trying to access an endpoint that doesn't exist or if you're using a plugin and your plugin isn't properly returning anything.

For example, in @rob2791's case, there is an issue with the logs endpoint. This may be that that log doesn't exist. Did you verify that you can access that endpoint through the API and that it gives a valid output?

@SNapier can you post the check you have that's giving an error?

SNapier commented 9 months ago

Only checks configured on this host are the defaults in the example.cfg image

ne-bbahn commented 9 months ago

Can I get the passive log before the error so I can identify which endpoint is breaking?

rob2791 commented 9 months ago

This error occurs when your plugin/endpoint isn't responding properly. I typically occurs when you are either trying to access an endpoint that doesn't exist or if you're using a plugin and your plugin isn't properly returning anything.

For example, in @rob2791's case, there is an issue with the logs endpoint. This may be that that log doesn't exist. Did you verify that you can access that endpoint through the API and that it gives a valid output?

@SNapier can you post the check you have that's giving an error?

@ne-bbahn Everything was working fine before the latest update. I was pulling log events from about 10 servers on various things. Unfortunately, our enterprise update software automatically updated the NCPA package.. and I walked in one morning to all the errors. Nothing else changed except NCPA. Unless there is something different with this version that needs to be in the CFG?

SNapier commented 9 months ago

Can I get the passive log before the error so I can identify which endpoint is breaking? Unfortunately there are no prior logs, this is a fresh install.

SNapier commented 9 months ago

@ne-bbahn Everything was working fine before the latest update. I was pulling log events from about 10 servers on various things. Unfortunately, our enterprise update software automatically updated the NCPA package.. and I walked in one morning to all the errors. Nothing else changed except NCPA. Unless there is something different with this version that needs to be in the CFG?

I ran into this yesterday as well. If the repos that are created when installing XI are not marked to be ignored, the latest and greatest gets installed when running the patching with Yum.

ne-bbahn commented 9 months ago

so this is the entirety of your passive log?:

2023-11-17 09:20:55,678 passive ERROR Stdout or returncode was None, cannot return meaningfully. Traceback (most recent call last): File "ncpa.py", line 319, in run_all_handlers File "passive\nrdp.py", line 113, in run File "passive\nrdp.py", line 95, in get_xml_of_checkresults File "passive\nrdp.py", line 54, in make_xml File "passive\ncpacheck.py", line 84, in run ValueError: Stdout or returncode was None, cannot return meaningfully.

Can you check those API endpoints manually to verify that they're working? Do any passive checks show up in the interface under Checks?

thr03j0n4s commented 7 months ago

I've got the same Error.

2024-02-26 11:57:15,809 passive ERROR Stdout or returncode was None, cannot return meaningfully. Traceback (most recent call last): File "ncpa.py", line 339, in run_all_handlers File "passive\nrdp.py", line 113, in run File "passive\nrdp.py", line 95, in get_xml_of_checkresults File "passive\nrdp.py", line 54, in make_xml File "passive\ncpacheck.py", line 84, in run ValueError: Stdout or returncode was None, cannot return meaningfully.

NCPA Version is 3.0.1, but I've had it in 3.0.0 too.

The weird thing is, that we run NCPA on almost 200 Windows Servers but only our Microsoft Terminal Servers have this issue. AND it does only happen sporadically. Most of the time it works well but every now and then (time between incidences varies), on one of the Terminal Servers (randomly which one) all checks go red. I have to restart the ncpa service on that server and everything works again ... for some days.

Could you explain what you mean by "API endpoints"? In my case I tend to believe its a passive check on a windows service by using the build in services feature. in nrdp.cfg is: %HOSTNAME%|Check-Name|420 = services?service=stunnel&status=running

I've checked the last runned check in the ncpa webinterface under services for the system. The check on stunnel service is the one which should be run next, but instead there is just nothing until the ncpa service restarts.

If that service doesn't exist, it actually should give back an unkown state with the info that this service couldn't be found on the the system, am I right? ... I mean ... I don't know why it shouldn't exists there because after restart all checks, including the for stunnel service, are "OK" again for some hours/days.

ne-bbahn commented 7 months ago

Could you explain what you mean by "API endpoints"? In my case I tend to believe its a passive check on a windows service by using the build in services feature. in nrdp.cfg is: %HOSTNAME%|Check-Name|420 = services?service=stunnel&status=running

You can check your checks in the interface via the API section under https://ip_address:5693 or by querying https://ip_address:5693/api/endpoint/path which could be https://localhost:5693/api/plugins/testplugin So if you wanted to check the above check command via the API, you could check https://localhost:5693/api/services?service=stunnel&status=running to see if NCPA is handling the check properly or not.

If that service doesn't exist, it actually should give back an unkown state with the info that this service couldn't be found on the the system, am I right? ... I mean ... I don't know why it shouldn't exists there because after restart all checks, including the for stunnel service, are "OK" again for some hours/days.

You are definitely correct here. I've taken a look at some of the code relating to checks and it's definitely not handled correctly. I'll try to get it working properly for 3.0.2

thr03j0n4s commented 7 months ago

@ne-bbahn

You can check your checks in the interface via the API section under https://ip_address:5693 or by querying https://ip_address:5693/api/endpoint/path which could be https://localhost:5693/api/plugins/testplugin So if you wanted to check the above check command via the API, you could check https://localhost:5693/api/services?service=stunnel&status=running to see if NCPA is handling the check properly or not.

Thanks for the input. Actually had this with 3 of our Terminal Servers again and checked on one of them. The server has still not send any results to the nagios server and if I query this via api as you've mentioned, I get an json result with every running service.

{"services": {"stunnel": "running", "AdobeARMservice": "running", "AGSService": "running", [...], }}

I dont think this is the wanted behavior, am I right?

Some more Info:

Response-Headers: Access-Control-Allow-Origin: * Content-Length: 4891 Content-Security-Policy: frame-ancestors 'self' Content-Type: application/json Date: Tue, 05 Mar 2024 13:51:40 GMT Strict-Transport-Security: max-age=31536000; includeSubDomains Vary: Cookie X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN

Request-Headers: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8 Accept-Encoding: gzip, deflate, br Accept-Language: de,en-US;q=0.7,en;q=0.3 Connection: keep-alive Host: hostname.subdomain.tld:5693 Sec-Fetch-Dest: document Sec-Fetch-Mode: navigate Sec-Fetch-Site: none Sec-Fetch-User: ?1 Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0

ne-bbahn commented 7 months ago

Thanks for the input. Actually had this with 3 of our Terminal Servers again and checked on one of them. The server has still not send any results to the nagios server and if I query this via api as you've mentioned, I get an json result with every running service.

{"services": {"stunnel": "running", "AdobeARMservice": "running", "AGSService": "running", [...], }}

I dont think this is the wanted behavior, am I right?

It would seem that NCPA is using an inclusive search where having both service=stunnel and service=running to return both the service named stunnel and all the services that are running. If you want it to check whether the service/services specified are running, you need to add check=true to make it run as a Nagios check. This does seem to be the intended behavior.

ne-bbahn commented 6 months ago

This is solved in NCPA 3.0.2. If you continue to have issues, we can reopen this and discuss.