Cacti / plugin_thold

Thold Plugin for Cacti
GNU General Public License v2.0
60 stars 60 forks source link

Thold "Consider Unknown Data as Zero" setting not working as expected #650

Closed tddcacti closed 5 months ago

tddcacti commented 6 months ago

Describe the bug See https://forums.cacti.net/viewtopic.php?t=62916

Recently enabled thold version 1.5.2 on a Cacti server running version 1.2.23. Subsequently tried thold 1.8 with Cacti 1.2.25, and observed the same behavior.

References to line numbers have been updated below to reflect current 1.8 develop branch as of this writing.

We have graphs for temperature sensors and want to enable both high and low thresholds on these. Some of these sensors occasionally experience connectivity issues, which results in NaNs in the graphs. When the NaNs occur, thold is considering the value to be zero.

I am aware of the thold setting "Consider Unknown Data as Zero", and have verified this option is disabled.

Dove into the code a bit and I believe I've found what is happening. In thold_functions.php, the function thold_get_currentval starting on line 737:

The variable currentval is set to zero on line 754. That variable will be changed within the function based on the logic of the switch statement starting in line 759, however that switch statement is only evaluated when the if statement on line 758 is true.

That if statement checks the value of the item to see whether or not it is a numeric value. If it is, then we proceed to the switch statement. If it isn't, then we skip all of that and simply return currentval. This means that if the current value of the item is non-numeric, we are always returning zero.

In polling.php, we use this function on line 157, and use the result to populate the value of the postvalue variable, which is fed into another "currentval" variable in the next line.

The logic related to considering unknown as zero appears shortly thereafter within the same function, starting lines 190 to 201. However the if statement on line 190 is never true:

if (!is_numeric($currentval))

because per the above explanation, whenever that value is non-numeric, we've already set it to zero.

In my deployment I made a simple change to thold_functions.php on line 754....changed it from this:

$currentval = 0;

To this:

$currentval = '';

With that change, any thresholds where the graph has a NaN as the current value, the threshold shows current value as a dash, instead of a zero, and we no longer get low threshold breach alerts. Hopefully this was the correct change to make, and doesn't create any problems elsewhere that I'm not aware of.

To Reproduce To reproduce the issue, take a graph that occasionally has NaNs, or perhaps easier, intentionally create a graph that is always NaNs, such as a device which is offline or failing SNMP.

  1. Under Console > Configuration > Settings > Alerting/Thold, ensure that the setting "Consider Unknown Data as Zero" is disabled. Otherwise ensure that thold is enabled and functioning generally

ConsiderZeroDisabled

  1. For the graph type you are using for testing purposes, create a new high/low threshold template. In that template define a Low Alert Threshold which is higher than zero, for example 40. Also configure the template to send email alerts to your email address.

LowAlertThreshold

  1. Create a threshold for the graph you are testing with, using the threshold template from step 2
  2. Wait for a few polling cycles, and watch the status of the threshold. Note that the current value of the threshold shows up as zero.

NaN_Zero

  1. After the minimum trigger duration time has elapsed, the threshold should go into the Alert state and an email sent.

Expected behavior With the setting "Consider Unknown Data as Zero" disabled, the current value of the threshold in this situation should not show zero. It should show no value at all, and should not trigger the alert state.

To Reproduce the proposed fix Starting from where the above steps leave off:

  1. Edit thold_functions.php line 754, change it from this:

    $currentval = 0;

To this:

$currentval ='';
  1. Observe the status of the threshold. The current value no longer shows as zero, but a dash, and it does not go into the alert state

CurrentValueDash

Plugin (please complete the following information):

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

TheWitness commented 6 months ago

@xmacan , what do you think of this suggestion?

xmacan commented 6 months ago

@TheWitness I missed this issue, sorry. I will look at it asap.

TheWitness commented 6 months ago

Thanks @xmacan!

xmacan commented 6 months ago

It seems reasonable. I will test it and prepare PR if I will not find any other issue