SteScho / manubulon-snmp

Set of Icinga/Nagios plugins to check hosts and hardware with the SNMP protocol.
GNU General Public License v2.0
73 stars 71 forks source link

check_snmp_process - Slightly confusing error message due to rounding #85

Closed peternewman closed 10 months ago

peternewman commented 10 months ago

As far as I could tell, this is the most popular/official migration of http://nagios.manubulon.com/ ?

There's a slightly confusing issue with check output sometimes, if something is just over the threshold. 2 process matching foo.exe (> 1) (<= 2):OK, Mem : 20.0Mb > 20 WARNING | 'memory_usage'=20.0MB;20;30 'num_process'=2;1;1

I don't feel like I've had this much with other checks for some reason, perhaps they just don't have as much granularity...

Expected Behavior

Unsure, something less confusing...

Current Behavior

The check outputs this warning: 2 process matching foo.exe (> 1) (<= 2):OK, Mem : 20.0Mb > 20 WARNING | 'memory_usage'=20.0MB;20;30 'num_process'=2;1;1

Printing the raw values shows: 20.04296875 > 20

Possible Solution

I'm not sure of an ideal solution, showing more decimal places wouldn't be flawless and would be noisy. Rounding the source data earlier loses accuracy...

Steps to Reproduce (for bugs)

  1. Check something with a minor increase over the configured threshold.

Admittedly I've not tried the latest release yet, but the comparison line is the same so I doubt it's fixed.

Context

Just slightly confusing for users of the monitoring.

Your Environment

peternewman commented 10 months ago

I can confirm the same thing happens with this release too: check_snmp_process version : 2.1.0

drdisk commented 10 months ago

Sorry for the delay, I was a view days off.

Ok, I can understand that this may be confusing for some people. However, the status display of a check is always only a rough summary. Ultimately, you have to check in detail why the check is really not OK - unless it is already very, very clear.

My math professor once told me that 0 = 1 if the 0 is big enough and the 1 is small enough. This can happen in mathematics. Ok, we had a problem near infinity that we were able to reduce to this equation. Maybe something different.

Nevertheless, as an IT administrator with many years of experience in the area of monitoring, I think that a check that marks 20 as not ok and has 20 as the limit is probably just above it. Maybe the influence of my math teacher is still too great here.

However, I don't know of a simple solution that could convey the information more precisely in as little text as possible. If the 20 is the same as the 20 and the check is red instead of green, then one 20 will probably be slightly larger than the other.

If anyone knows of a simple way that presents the information more precisely and is still short enough, please feel free to get in touch. Until then, I'll close this ticket because I don't see any urgent reason to change nor do I know of a suitable solution.