NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
182 stars 94 forks source link

HTTP 500 Error When Requesting Windows Logs #1194

Open matthewqbatten opened 3 months ago

matthewqbatten commented 3 months ago

I have a backup process which logs an event in the Windows Event Viewer when it completes successfully. I'm trying to use Nagios with NCPA to verify that this process is being completed regularly. Certain backups happen daily, others happen weekly or monthly. Unfortunately, when I request older logs, I get a server error:

./check_ncpa -H <REDACTED> -t 'REDACTED' -M logs -q 'name=Veeam Backup,event_id=190,logged_after=96h,message=REDACTED' -c1:
UNKNOWN: An error occurred connecting to API. (HTTP error: '500 INTERNAL SERVER ERROR')

The longer the windows server being polled is up, the shorter a logged_after window it can handle. It never seems to be able to handle more than a few days, but over time it shrinks until it gives that error checking the logs from the previous night. Rebooting the whole server and waiting until a new log gets written seems to reset that.

The ncpa_listener log on the server contains the following error:

2024-07-22 16:54:50,137 listener ERROR time data '2024-07-21 02:00:17' does not match format '%Y-%m-%d %H:%M:%S.%f'
Traceback (most recent call last):
  File "listener\windowslogs.py", line 88, in get_logs
  File "listener\windowslogs.py", line 596, in get_event_logs
  File "_strptime.py", line 567, in _strptime_datetime
  File "_strptime.py", line 349, in _strptime
ValueError: time data '2024-07-21 02:00:17' does not match format '%Y-%m-%d %H:%M:%S.%f'
2024-07-22 16:54:50,137 listener.server ERROR Exception on /api/logs/ [GET]
Traceback (most recent call last):
  File "listener\windowslogs.py", line 88, in get_logs
  File "listener\windowslogs.py", line 596, in get_event_logs
  File "_strptime.py", line 567, in _strptime_datetime
  File "_strptime.py", line 349, in _strptime
ValueError: time data '2024-07-21 02:00:17' does not match format '%Y-%m-%d %H:%M:%S.%f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "listener\windowslogs.py", line 104, in run_check
  File "listener\windowslogs.py", line 77, in walk
  File "listener\windowslogs.py", line 74, in log_method
  File "listener\windowslogs.py", line 93, in get_logs
Exception: General error occurred while getting log Veeam Backup: ValueError("time data '2024-07-21 02:00:17' does not match format '%Y-%m-%d %H:%M:%S.%f'")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "flask\app.py", line 1473, in wsgi_app
  File "flask\app.py", line 882, in full_dispatch_request
  File "flask\app.py", line 880, in full_dispatch_request
  File "flask\app.py", line 865, in dispatch_request
  File "listener\server.py", line 317, in token_auth_decoration
  File "listener\server.py", line 1507, in api
  File "listener\windowslogs.py", line 107, in run_check
AttributeError: 'Exception' object has no attribute 'message'

It seems that sometimes that the timestamp in the log is dropping the fractions of a second and breaking the parsing.

I'm running the latest version, 3.1.0. I also tried 3.0.1 and that gave a different error. Is there anything I can do to help fix this? I'd like to be able to monitor my weekly backups as well as the daily ones, but this bug makes that impossible.