SnejPro / check_synology

Icinga2 script for Synology Diskstation
0 stars 3 forks source link

Return correct Nagios return codes when unreachable or on internal error #11

Closed geotekberlin closed 1 year ago

geotekberlin commented 2 years ago

When a host device does not respond or is unreachable, the check command returns status=1 which is interpreted as "Warning" by icinga. This also happens if the check is crashing due to an internal error.

This is bad because it causes Icinga to show this device being "reachable", which is not true. When there is no responsse or if an internal crash occurs the plugin should exit with return code=3 which means "Unknown" according to Nagios Guidelines.

SnejPro commented 2 years ago

@geotekberlin Thank you for the suggestion. I am currently working on a complete rewrite of the plugin. I've added your suggestion in the dev-branch. But I'm going to test this branch on my NAS for a week before merge it to master. But feel free to use the dev branch.

geotekberlin commented 2 years ago

This sounds great! I have evaluated virtually every available Synology plugin that is out there and chose yours because it has the most flexible options.. It only needs some more robustness. Therefore, Heads Up!

geotekberlin commented 2 years ago

@SnejPro, just give me a note if the rewrite reaches the beta stage, I will help testing it then. Currently there the memory check in the dev branch throws an error:

root@geotek-icinga:~# '/usr/lib/nagios/plugins/check_synology_dev.py' '--temp_crit' '80' '-x' 'ups' '-v' '2c' '-m' 'memory' '-H' 'hidden' '-C' 'hidden' '--ups_load_warn' '80' '--ups_load_crit' '90' '--ups_level_warn' '25' '--ups_level_crit' '50' '--temp_warn' '70' '--auth_prot' 'SHA' '--storage_used_warn' '80' '--storage_used_crit' '90' '--priv_prot' 'AES' '--port' '161' '--net_warn' '90' '--net_crit' '95' '--memory_warn' '90' '--memory_crit' '98' '--disk_temp_warn' '60' '--disk_temp_crit' '70' Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_synology_dev.py", line 496, in render("Memory - Total", "memory-total", True, int(queue_result[0]["data"]['1.3.6.1.4.1.2021.4.5.0'])*1000, unit="B") ValueError: invalid literal for int() with base 10: '513892 kB'

SnejPro commented 2 years ago

@SnejPro, just give me a note if the rewrite reaches the beta stage, I will help testing it then. Currently there the memory check in the dev branch throws an error:

root@geotek-icinga:~# '/usr/lib/nagios/plugins/check_synology_dev.py' '--temp_crit' '80' '-x' 'ups' '-v' '2c' '-m' 'memory' '-H' 'hidden' '-C' 'hidden' '--ups_load_warn' '80' '--ups_load_crit' '90' '--ups_level_warn' '25' '--ups_level_crit' '50' '--temp_warn' '70' '--auth_prot' 'SHA' '--storage_used_warn' '80' '--storage_used_crit' '90' '--priv_prot' 'AES' '--port' '161' '--net_warn' '90' '--net_crit' '95' '--memory_warn' '90' '--memory_crit' '98' '--disk_temp_warn' '60' '--disk_temp_crit' '70' Traceback (most recent call last): File "/usr/lib/nagios/plugins/check_synology_dev.py", line 496, in render("Memory - Total", "memory-total", True, int(queue_result[0]["data"]['1.3.6.1.4.1.2021.4.5.0'])*1000, unit="B") ValueError: invalid literal for int() with base 10: '513892 kB'

That's interesting. It looks like your NAS is reporting the total memory with the unit. The plugin stores the fetched values in /tmp/check_synology_**HOSTNAME**_**MODE**.json. Can you please post the values where the key starts with "1.3.6.1.4.1.2021.4"? Which Version is the system that throws this error? My NAS with newest DSM does not have this issue.

geotekberlin commented 2 years ago

We currently check a DS214 and a DS220+ with okrouhly's fork of your plugin and it works. Your dev branch throws errors on both. Here are the corresponding json files: json.zip .

SnejPro commented 2 years ago

The same question: Is the error still there?

geotekberlin commented 2 years ago

Sorry to respond late but I wasn't aware that you released a new version.

Now the 1.0 plugin deals correctly with non-responding hosts but storage and memory checks are still crashing. Storage checks return this error: File "/usr/lib/nagios/plugins/check_synology.py", line 523, in size = int(queue_result[0]["data"]['1.3.6.1.2.1.25.2.3.1.4.'+str(num)])*float(queue_result[0]["data"]['1.3.6.1.2.1.25.2.3.1.5.'+str(num)]) ValueError: invalid literal for int() with base 10: '4096 Bytes' and menory checks throw this error: File "/usr/lib/nagios/plugins/check_synology.py", line 496, in render("Memory - Total", "memory-total", True, int(queue_result[0]["data"]['1.3.6.1.4.1.2021.4.5.0'])*1000, unit="B") ValueError: invalid literal for int() with base 10: '7979360 kB'

SnejPro commented 1 year ago

It looks like some DiskStations answer only with a number and others answer with a number and a unit. Mine only answers with the integer.

Is your DSM updated?

SnejPro commented 1 year ago

Solved by @fantasyreader97 in PR https://github.com/SnejPro/check_synology/pull/17