NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 94 forks source link

Inconsistently-reproducible 500 error on NCPA 3 beta #1004

Closed sawolf closed 11 months ago

sawolf commented 11 months ago

I have a relatively long-running check (it runs a ping and traceroute together) that I'm using with NCPA 3 beta. Around 30-40% of the time, I get a message from NCPA claiming there was a 500 INTERNAL SERVER ERROR.

I get this in the logs:

2023-10-31 16:53:41,120 geventwebsocket.handler INFO ::ffff:192.168.0.75 - - [2023-10-31 16:53:41] "GET /api/plugins/check_traceroute.py/192.168.0.75%3A5693%3Amytoken%2Cwww.nagios.com?token=mytoken&check=1 HTTP/1.1" 200 1906 55.472920
2023-10-31 16:53:47,632 geventwebsocket.handler INFO ::ffff:192.168.0.75 - - [2023-10-31 16:53:47] "GET /api/plugins/check_traceroute.py/www.nagios.com?token=mytoken&check=1 HTTP/1.1" 200 1880 55.320394
2023-10-31 16:55:31,087 listener INFO before_request() - request.url: https://192.168.0.75:5693/api/plugins/check_traceroute.py/www.nagios.com?token=mytoken&check=1
2023-10-31 16:56:30,137 root ERROR Error: Plugin command (python3 /usr/local/ncpa/plugins/check_traceroute.py www.nagios.com) timed out. (59 sec)
2023-10-31 16:56:30,138 listener.server ERROR Exception on /api/plugins/check_traceroute.py/www.nagios.com [GET]
Traceback (most recent call last):
  File "listener/nodes.py", line 343, in run_check
  File "listener/nodes.py", line 307, in get_values
AttributeError: 'PluginNode' object has no attribute 'method'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "flask/app.py", line 2190, in wsgi_app
  File "flask/app.py", line 1486, in full_dispatch_request
  File "flask/app.py", line 1484, in full_dispatch_request
  File "flask/app.py", line 1469, in dispatch_request
  File "listener/server.py", line 289, in token_auth_decoration
  File "listener/server.py", line 1091, in api
  File "listener/nodes.py", line 345, in run_check
  File "listener/pluginnodes.py", line 145, in execute_plugin
AttributeError: 'str' object has no attribute 'decode'
2023-10-31 16:56:30,139 geventwebsocket.handler INFO ::ffff:192.168.0.75 - - [2023-10-31 16:56:30] "GET /api/plugins/check_traceroute.py/www.nagios.com?token=mytoken&check=1 HTTP/1.1" 500 2454 59.052948

I think the second error is just a python 2 vs python 3 incompatibility, not sure about the first.

ne-bbahn commented 11 months ago

I've managed to narrow down the issue to being that it occurs whenever a plugin call times out.

I should be able to solve this by adjusting how it handles timeout conditions.