Cacti / plugin_monitor

Monitor Plugin for Cacti
GNU General Public License v2.0
36 stars 39 forks source link

False reboot alerting on 497.1 day (4294967295 sec) of uptime #161

Closed interduo closed 6 months ago

interduo commented 7 months ago

Describe the bug Counter overflow of device uptime causes sending false reboot alerts. The device was not rebooted.

To Reproduce Monitor device with SNMP and just after uptime of ~497.1. You will get an false Cacti Reboot Notification Alert.

Expected behavior Device reboot exactly just after 497.1 days and just between (start and end of reboot) cycle pools is extremely unlikely. So, as a workaround you could compare 2 latest uptime values. If uptime has decreased AND it was less than (497.1 - small value, depends on how often you poll device) days before decrease, then a reboot took place. Otherwise no reboot, just counter overflow.

Connected https://github.com/Cacti/cacti/issues/5611

TheWitness commented 7 months ago

Yea, this is an overflow condition in the uptime counter. Known issue for sure for a long time. We switched Cacti to use the per second uptime indicator which would make the uptime overflow at 100x * 497.1 days, but it not universal for one. If there is one saving grace, the snmp_sysUpTimeInstance in cacti is defined as such:

| snmp_sysUpTimeInstance | bigint(20) unsigned |

So, that's way higher than 497.1 days even at 1/100 of a second granularity as is the normal uptime instance.

TheWitness commented 7 months ago

If you are getting notified at 497.1 days, then likely your device does not support the per-second uptime OID. Can you confirm?

TheWitness commented 7 months ago

Lastly the other issues in that ticket are Cacti 101 diagnostic issues. You have been using Cacti long enough that you should be answering those questions yourself.

TheWitness commented 7 months ago

Please have your colleagues at interduo.pl provide better Cacti training to you guys.

interduo commented 7 months ago

If you are getting notified at 497.1 days, then likely your device does not support the per-second uptime OID. Can you confirm?

# snmpwalk -v2c -c community1 172.20.1.254 .1.3.6.1.6.3.10.2.1.3
iso.3.6.1.6.3.10.2.1.3.0 = INTEGER: 43272154

Correct uptime is shown in https://cacti/host.php?action=edit&id=878. So I suppose that monitor plugin is checking uptime in other way (from other field) than core Cacti? That was first thing that suppried me and it was the purpose that I rise an issue.

We switched Cacti to use the per second uptime indicator which would make the uptime overflow at 100x * 497.1 days, but it not universal for one.

Could You confirm that this change was before 1.2.23/2.5 (monitor plugin version)?

Lastly the other issues in that ticket are Cacti 101 diagnostic issues. You have been using Cacti long enough that you should be answering those questions yourself.

We are small company but we got too much IT things to be experts all of that (most popular problem in PL IT). Cacti is working almost 100% in propper way so we discover cacti again&again&again during update yearly/frequent issue/yearly planned services. We do all the things without grudges with smile and humility :) Please don't get nervous, we improve our skills everyday!

TheWitness commented 7 months ago

So, I think I may know the source of this. If you check both the Spine or Cacti changelog's. You should find references to the per-second uptime. It's maybe after 1.2.23. Relative to what version you should be on. I would say 1.2.25 and soon to be 1.2.26 everyone should move to as quickly as possible. Likely the last of the 1.2.x series. Though 1.2.23 was not horribly bad, there have been a number of decent changes since then.

Relative to the company, in the past, before I found the company, I asked "is this a person or a company", and whoever responded lied. It is what it is. Neither you or I can change the past.

TheWitness commented 7 months ago

Issue might be fixed now.

interduo commented 7 months ago

Thanks for resolving this issue. Upgrade planned!

jdcoats commented 6 months ago

was it supposed to be uptime instead of update?

TheWitness commented 6 months ago

@jdcoats, @xmacan just fixed that.