exasol / nagios-monitoring

Docker container with installed and configured Nagios software for EXASOL DB monitoring.
MIT License
10 stars 11 forks source link

Incompatibility with EXAoperation 6.0? #1

Closed ChristianGfK closed 7 years ago

ChristianGfK commented 7 years ago

I just installed and configured this for all of our clusters. Among them is a single cluster that's already on version 6.0. It's also the only cluster that hosts more than one database.

The "Free DB space" service check fails for both databases on this cluster, with the ominous error:

WARNING - internal error
<Fault -1: 'Volume could not be found'>

Not sure if that's because it's version 6 or because there's two databases. The database performance data is gathered correctly, so I'm leaning towards v6.

The volumes shown in EXAoperation under "EXASolution" for each database match what is shown under "EXAStorage" and the monitoring user has read-only access to those volumes and has the supervisor role.

(Is this the right venue to report issues with this?)

florian-reck commented 7 years ago

Did you change the name of the system tag of your data volume (like adding a prefix)? We tried to reproduce the problem and it only occurs if the system given tag has been changed.

When you set up the permissions on your data and archive volumes in EXAoperation you don't need to set it on temporary volumes too. Those volumes will inherit their permission from their data volumes. If you try to change the permission on temporary volumes you will encounter strange error messages.

ChristianGfK commented 7 years ago

Ahh, that did the trick. I had indeed changed the EXAstorage labels from test_persistent to test_persistent_v6 during the upgrade to version 6.0, because as of now, EXAoperation has no way of showing which volumes use the new HBD on-disk storage format and which don't. So I assigned appropriate names to avoid accidentally deleting the new volumes instead of the old ones, after the upgrade process was finished.

I renamed them back to test_persistent and ci_persistent, respectively, and Nagios recognized them correctly on the next service check.

I guess parsing text modifiable by the end user to determine storage volume type isn't perfectly robust. ;-) (Must be on the XML-RPC side of things within EXAoperation, the Nagios check only calls https://github.com/EXASOL/nagios-monitoring/blob/master/opt/exasol/monitoring/check_db_diskspace.py#L116 and goes from there.)