exasol / nagios-monitoring

Docker container with installed and configured Nagios software for EXASOL DB monitoring.
MIT License
10 stars 11 forks source link

Wrong status of Check "Backup Status" ? #7

Closed mgusek closed 7 years ago

mgusek commented 7 years ago
Hi, i'm trying your monitoring solution. One think looks unclear: Check 'Backup Status dell_live_3' shows status 'OK - There is a valid backup' but in our Logs we can see, that the last backup was not successfull: ` 2017-08-31 03:29:39.980822 Error dell_live_3 pddserver(1.0): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_0/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.964281 Error dell_live_3 pddserver(1.7): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_7/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.963474 Error dell_live_3 pddserver(1.8): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_8/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.963324 Error dell_live_3 pddserver(1.6): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_6/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.961366 Error dell_live_3 pddserver(1.3): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_3/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.960139 Error dell_live_3 pddserver(1.1): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_1/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.959804 Error dell_live_3 pddserver(1.2): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_2/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.959787 Error dell_live_3 pddserver(1.9): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_9/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.958966 Error dell_live_3 pddserver(1.4): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_4/backup_201708310000 [No space left on device. Backup has been deleted]:
2017-08-31 03:29:39.956929 Error dell_live_3 pddserver(1.5): Backup error. Backup could not be written successfully: dell_live_3/id_3/level_1/node_5/backup_201708310000 [No space left on device. Backup has been deleted]:

` Service check 'Backup Status' should take that into account.

florian-reck commented 7 years ago

Hi,

the check_backup.py script only checks, if there is a restorable backup available within the last 7 days (preconfigured). It does not check if a backup fails or not, it just to be sure that on your storage is a backup for a worst case scenario available.

But you should definitly got an alert from the log service script when the backup fails.

Kind regards

florian-reck commented 7 years ago

I've created a description for this plugin here: https://github.com/EXASOL/nagios-monitoring/wiki/Plugin-Descriptions