Cacti / plugin_thold

Thold Plugin for Cacti
GNU General Public License v2.0
61 stars 60 forks source link

Thold converts available swap memory from positive to negative #603

Closed Alys closed 11 months ago

Alys commented 11 months ago

[Originally posted on the cacti forum.]

Describe the bug The thold plugin converts available swap memory from a positive to a negative value, resulting in incorrect and unnecessary alerts (e.g., sending an email to say that available swap memory -4,116,960 is below calculated baseline threshold -3,293,567.73). This isn't caused by PHP's time zone being wrong.

To Reproduce Steps to reproduce the behavior:

  1. unknown - I don't know what caused it

Expected behavior Available swap memory is reported as a positive number instead of a negative number.

Screenshots See below.

Plugin (please complete the following information):

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context

We have two cacti servers, both monitoring the same devices with the same settings. Each does its own polling and has its own database. Only one has this problem. Both servers are x86_64, RHEL 7.9.

One of the servers is running cacti version 0.8.8f and thold plugin version 0.5. It's behaving correctly. I'll refer to it below as "the old cacti server".

The other cacti server ("the new cacti server") used to be running those versions but we recently upgraded it to version 1.2.24, and the thold plugin to 1.5.2. Everything seems to be working correctly except that thold is converting the value for available swap memory (ucd_memAvailSwap) from positive to negative. It's doing this for all devices where ucd_memAvailSwap is monitored. It was not doing this before we upgraded it, and thold on the old server is not doing this.

For example, for one device, the current available swap memory is 4,116,960 as shown by free:

# free
             total       used       free     shared    buffers     cached
Mem:       1938456    1728856     209600        312     163096    1350692
-/+ buffers/cache:     215068    1723388
Swap:      4161532      44572    4116960

The graph for that device correctly states that the current free swap memory is 4.12 M (on both old and new cacti servers).

On the old cacti server, in the "thold" tab, the data for that device shows that the current value is "4116960" - i.e., it's correct.

However on the new cacti server, in the "Thold" tab, the data for that device is "-4,116,960" - i.e., the right value but negative.

Here's the thold_template settings for the ucd_memAvailSwap template:

MariaDB [cacti]> select * from thold_template where data_template_name='ucd_memAvailSwap'\G
*************************** 1. row ***************************
                         id: 22
                       hash: 65bab9b18d6d49a946e9b6d1cccea9d7
                       name: ucd_memAvailSwap [ucd_memAvailSwap]
             suggested_name: |data_source_description| [|data_source_name|]
         data_template_hash: 7fcc8ff25765979b5e1b2694c4530c21
           data_template_id: 115
         data_template_name: ucd_memAvailSwap
             data_source_id: 7247
           data_source_name: ucd_memAvailSwap
       data_source_friendly: ucd_memAvailSwap
                   thold_hi: 
                  thold_low: 
         thold_fail_trigger: 2
                    time_hi: 
                   time_low: 
          time_fail_trigger: 1
           time_fail_length: 1
           thold_warning_hi: 
          thold_warning_low: 
 thold_warning_fail_trigger: 2
   thold_warning_fail_count: 0
            time_warning_hi: 
           time_warning_low: 
  time_warning_fail_trigger: 1
   time_warning_fail_length: 1
              thold_enabled: on
                 thold_type: 1
          bl_ref_time_range: 86400
                bl_pct_down: 20
                  bl_pct_up: 
            bl_fail_trigger: 3
              bl_fail_count: NULL
                   bl_alert: 0
               repeat_alert: 48
               notify_extra: 
       notify_warning_extra: 
           notify_templated: on
             notify_warning: 1
               notify_alert: 1
        snmp_event_category: NULL
        snmp_event_severity: 3
snmp_event_warning_severity: 2
                      notes: 
                  data_type: 0
                 show_units: 
                       cdef: 0
                 percent_ds: ucd_memAvailSwap
                 expression: 
                   upper_ds: 
                     exempt: off
          thold_hrule_alert: NULL
        thold_hrule_warning: NULL
             restored_alert: off
                  reset_ack: off
                persist_ack: off
                 email_body: A warning has been issued that requires your attention. <br><br><strong>Host</strong>: <DESCRIPTION> (<HOSTNAME>)<br><strong>URL</strong>: <URL><br><strong>Message</strong>: <SUBJECT><br><br><GRAPH>
            email_body_warn: A warning has been issued that requires your attention. <br><br><strong>Host</strong>: <DESCRIPTION> (<HOSTNAME>)<br><strong>URL</strong>: <URL><br><strong>Message</strong>: <SUBJECT><br><br><GRAPH>
        email_body_restoral: 
           trigger_cmd_high: 
            trigger_cmd_low: 
           trigger_cmd_norm: 
            syslog_priority: NULL
            syslog_facility: NULL
             syslog_enabled: 
1 row in set (0.00 sec)

image

I've seen forum posts about negative values where the solution was to fix PHP's time zone, but I believe that's not relevant to this case. The timezone is correct in the new cacti server's php.ini file as shown below, and the same time zone is in php.ini on the old cacti server.

[~]$ grep -i timezone /etc/php.ini | grep -v "^;"
date.timezone = "Australia/Brisbane"

[~]$ ls -l /etc/localtime
lrwxrwxrwx. 1 root root 40 Sep 14  2017 /etc/localtime -> ../usr/share/zoneinfo/Australia/Brisbane

In case it helps, below is some data for the device that I used in the example above, taken from the database on the new cacti server. The data seems to be the same on the old cacti server, at least where similar database fields exist.

MariaDB [cacti]> select * from plugin_thold_log where host_id=128 ; 
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+
| id     | time       | host_id | local_graph_id | threshold_id | threshold_value | current  | status | type | description                                                                                                                              |
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+
[snip lots of data]
| 243138 | 1682926434 |     128 |           4675 |          926 | 20              | -4116960 |      1 |    1 | Thold Baseline Cache Log                                                                                                                  |
| 243153 | 1682926745 |     128 |           4675 |          926 | 20              | -4116960 |      1 |    1 | Thold Baseline Cache Log                                                                                                                  |
| 243168 | 1682927033 |     128 |           4675 |          926 | 20              | -4116960 |      1 |    1 | Thold Baseline Cache Log                                                                                                                  |
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+-------------------------------------------------------------------------------------------------------------------------------------------+
8923 rows in set (0.04 sec)
MariaDB [cacti]> select * from cdef_items where id=128\G
*************************** 1. row ***************************
      id: 128
    hash: 2c2bf51719766ffba75900a2768570fc
 cdef_id: 48
sequence: 1
    type: 6
   value: d
1 row in set (0.00 sec)
MariaDB [cacti]> select * from data_input_fields where id=128\G
*************************** 1. row ***************************
           id: 128
         hash: 5553162fceec749a281dfc315c0630ad
data_input_id: 23
         name: Questions
    data_name: Questions
 input_output: out
   update_rra: on
     sequence: 0
    type_code: 
 regexp_match: 
  allow_nulls: 
1 row in set (0.01 sec)
MariaDB [cacti]> select * from thold_data where host_id=128 and data_source_name='ucd_memAvailSwap'\G
*************************** 1. row ***************************
                         id: 926
                       name: [redacted host name] - memAvailSwap [ucd_memAvailSwap]
                 name_cache: [redacted host name] - memAvailSwap [ucd_memAvailSwap]
              local_data_id: 5368
       data_template_rrd_id: 19762
             local_graph_id: 4675
          graph_template_id: 93
         data_template_hash: 7fcc8ff25765979b5e1b2694c4530c21
           data_template_id: 115
           data_source_name: ucd_memAvailSwap
                   thold_hi: 0
                  thold_low: -3293568
         thold_fail_trigger: 2
           thold_fail_count: 0
                    time_hi: 
                   time_low: 
          time_fail_trigger: 1
           time_fail_length: 1
           thold_warning_hi: 
          thold_warning_low: 
 thold_warning_fail_trigger: 2
   thold_warning_fail_count: 0
            time_warning_hi: 
           time_warning_low: 
  time_warning_fail_trigger: 1
   time_warning_fail_length: 1
                thold_alert: 0
           prev_thold_alert: 0
              thold_enabled: on
                 thold_type: 1
          bl_ref_time_range: 86400
                bl_pct_down: 20
                  bl_pct_up: 
            bl_fail_trigger: 3
              bl_fail_count: 5470
                   bl_alert: 1
                   lastread: 4116960
                   lasttime: 2023-05-01 17:50:02
                lastchanged: 2023-04-12 17:53:24
                   oldvalue: 4116960
               repeat_alert: 48
               notify_extra: 
       notify_warning_extra: 
             notify_warning: 1
               notify_alert: 1
        snmp_event_category: NULL
        snmp_event_severity: 3
snmp_event_warning_severity: 2
           thold_daemon_pid: 
                      notes: 
                    host_id: 128
            syslog_priority: 3
            syslog_facility: NULL
             syslog_enabled: 
                  data_type: 0
                 show_units: 
                       cdef: 0
                 percent_ds: ucd_memAvailSwap
                 expression: 
                   upper_ds: 
          thold_template_id: 22
           template_enabled: on
                     tcheck: 1
                     exempt: off
             acknowledgment: 
          thold_hrule_alert: NULL
        thold_hrule_warning: NULL
             restored_alert: off
                  reset_ack: 
                persist_ack: 
                 email_body: A warning has been issued that requires your attention.[snip more text]
            email_body_warn: A warning has been issued that requires your attention.[snip more text]
        email_body_restoral: 
           trigger_cmd_high: 
            trigger_cmd_low: 
           trigger_cmd_norm: 
             bl_thold_valid: 1682985600
1 row in set (0.00 sec)

I've managed to work out that thold on the new cacti server is multiplying the value by -1 in the code below, from thold's thold_functions.php file:

function thold_build_cdef($cdef, $value, $local_data_id, $data_template_rrd_id) {
    [snip]
    while($cursor < $x) {
            $type = $cdef_array[$cursor]['type'];
            switch($type) {
            case 6:
                    array_push($stack, $cdef_array[$cursor]);

                    break;
            case 2:
                    // this is a binary operation. pop two values, and then use them.
                    $v1 = thold_expression_rpn_pop($stack);
                    $v2 = thold_expression_rpn_pop($stack);
            ################### This next line is where the multiplication by -1 happens: ###################
                    $result = thold_rpn($v2['value'], $v1['value'], $cdef_array[$cursor]['value']);
                    // put the result back on the stack.
                    array_push($stack, array('type' => 6, 'value' => $result));

Below are sample values for the $v2, $v1, and $cdef_array variables when $cursor is 2, and the value of $result:

v2 :
(
  'id' => '10',
  'hash' => 'c888c9fe6b62c26c4bfe23e18991731d',
  'cdef_id' => '3',
  'sequence' => '1',
  'type' => 6,
  'value' => '4116960',
)

v1 :
(
  'id' => '12',
  'hash' => '4355c197998c7f8b285be7821ddc6da4',
  'cdef_id' => '3',
  'sequence' => '2',
  'type' => '6',
  'value' => '-1',
)

cdef_array[cursor] :
(
  'id' => '11',
  'hash' => '1e1d0b29a94e08b648c8f053715442a0',
  'cdef_id' => '3',
  'sequence' => '3',
  'type' => '2',
  'value' => '3',
)

$result :
-4116960

So the calculations done by the thold_rpn function is working as intended (4116960 * -1) but I don't understand why that code is being run, or why a similar calculation isn't being done on the old cacti server.

TheWitness commented 11 months ago

The latest thold development branch is under heavy redevelopment right now. In the new thold version which will be 2.0 there will be eight different baseline methods. There are a few people testing right now, if you would like to be one of them feel free to download and provide your feedback.

Alys commented 11 months ago

Thank you, that's good to know, but the current baseline method is doing what we want on the old version of cacti. Do you know why it's changed, or if there's any workaround? Or if it's easier for you, you could you please tell me which issue this was a duplicate of? I haven't be able to find one similar. Sorry for logging a duplicate though.

TheWitness commented 11 months ago

Just pick the version that works as expected.

Types

Aggregators

Methods

image

TheWitness commented 11 months ago

Don't ask for what used to work, cause it was inconsistent. Think of how you WANT it to work using this model.

TheWitness commented 11 months ago

I changed it a little this morning.

image