Closed sysres-dev closed 4 years ago
Any update ? I can help and provide more informations if you need.
You would need to narrow down more specifics on this and examples of how the data is definitely swapped, eg, which two fields appear to be swapped, and the log output showing the data being updated into the wrong one to help narrow down how that could be occurring If that problem is because of spine, it will be present in the current version as I can't see a change that addresses this kind of behaviour.
In these two log lines, results are swapped :
The output of this line (it is a disk check) belongs to the test run at the second line (a ping check) :
Device[65140] TH[1] DS[1061994] SS[3] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:customcommunity::::::' 'get' 'total' '1', output: min:0.1140 avg:0.2532 max:0.7900 dev:0.1770 loss:0.0000
However, when i manually run the command for ss_host_disk.php in script server for this host , i dont have 214745153536 as output, so this result may is swapped with something else as well.
Device[65140] TH[2] DS[1062011] SS[3] SERVER: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: 214745153536
I just reran the test, and i observed the following :
Device[65140] TH[1] DS[1061994] SS[3] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:customcommunity::::::' 'get' 'total' '1', output: min:0.1120 avg:0.1291 max:0.1480 dev:0.0108 loss:0.0000 Device[65140] TH[2] DS[1062011] SS[3] SERVER: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: 214745153536 Device[65140] TH[2] DS[1062000] SS[3] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:customcommunity::::::' 'get' 'total' '7', output: 214745153536
Here are the same tests run manually from php script server :
/opt/rh/rh-php70/root/bin/php -q /var/www/cacti/script_server.php
PHP Script Server has Started - Parent is cmd /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:customcommunity::::::' 'get' 'total' '7' 214745153536 /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '' min:0.1110 avg:0.1478 max:0.3590 dev:0.0552 loss:0.0000 /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:customcommunity::::::' 'get' 'total' '1' 214759895040
For this host and these tests :
ss_host_disk.php with args 'total' '1', i should have 214759895040 but i have min:0.1120 avg:0.1291 max:0.1480 dev:0.0108 loss:0.0000 (which is a ping output)
ss_fping.php, i should have min:0.1110 avg:0.1478 max:0.3590 dev:0.0552 loss:0.0000 but i have 214745153536
ss_host_disk.php with args 'total' '7', i should have 214745153536 and i have 214745153536 (which is correct)
So, from what i have observed so far :
If you need more specific tests i'm ready to help.
Thanks, quite a bit of information for me to review. I'll have time to do so probably at the weekend now.
I'm not sure how to verify it, but since all affected scripts are php scripts, it could be somehow related to the script_server.php, or the way it is called and handled by spine.
It's likely the snmp timeouts from the previous response. Those should be redirected to standard error, which should be redirected to the cacti standard error log by default. That should be confirmed be a Dev. With the script server, you have to be careful with your script server scripts not outputting to standard output more than a single line, and they need to output that single line at one time. Otherwise the Script Server output will get out of alignment with the poller.
I tried different tests to get more insight about it, i first disabled all data sources, except one which is working properly from running script server manually, i got this output with spine :
/usr/local/spine/bin/spine --mode online --poller 12 --hostlist=65140 --mibs -C /usr/local/spine/etc/spine.conf -R -S -V 5
SPINE: Using spine config file [/usr/local/spine/etc/spine.conf] 2019/12/24 17:31:04 - SPINE: Poller[12] DEBUG: The path_php_server variable is /var/www/cacti/script_server.php 2019/12/24 17:31:04 - SPINE: Poller[12] DEBUG: The path_cactilog variable is /var/log/cacti/cacti.log DEBUG: The log_destination variable is 4 (STDOUT) DEBUG: The path_php variable is /opt/rh/rh-php70/root/bin/php DEBUG: The availability_method variable is 2 DEBUG: The ping_recovery_count variable is 3 DEBUG: The ping_failure_count variable is 2 DEBUG: The ping_method variable is 1 DEBUG: The ping_retries variable is 1 DEBUG: The ping_timeout variable is 600 DEBUG: The snmp_retries variable is 1 DEBUG: The log_perror variable is 1 DEBUG: The log_pwarn variable is 0 DEBUG: The boost_redirect variable is 1 DEBUG: The boost_rrd_update_enable variable is 1 DEBUG: The log_pstats variable is 0 DEBUG: The threads variable is 40 DEBUG: The polling interval is 300 seconds DEBUG: The number of concurrent processes is 50 DEBUG: The script timeout is 20 DEBUG: The selective_device_debug variable is 46480,65178,46420,65140 DEBUG: The spine_log_level variable is 2 DEBUG: The number of php script servers to run is 5 DEBUG: Device List to be polled='65140', TotalPHPScripts='1 DEBUG: The PHP Script Server is Required DEBUG: The Maximum SNMP OID Get Size is 20 Selective Debug Devices 46480,65178,46420,65140 Version 1.1.38 starting DEBUG: MySQL is Thread Safe! DEBUG: Spine is running asroot. SPINE: Initializing Net-SNMP API DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP. SPINE: Initializing PHP Script Server(s) DEBUG: SS[0] PHP Script Server Routine Starting DEBUG: SS[0] PHP Script Server About to FORK Child Process DEBUG: SS[0] PHP Script Server Child FORK Success ERROR: SS[0] Script Server did not start properly return message was: DEBUG: SS[1] PHP Script Server Routine Starting DEBUG: SS[1] PHP Script Server About to FORK Child Process DEBUG: SS[1] PHP Script Server Child FORK Success ERROR: SS[1] Script Server did not start properly return message was: DEBUG: SS[2] PHP Script Server Routine Starting DEBUG: SS[2] PHP Script Server About to FORK Child Process DEBUG: SS[2] PHP Script Server Child FORK Success ERROR: SS[2] Script Server did not start properly return message was: DEBUG: SS[3] PHP Script Server Routine Starting DEBUG: SS[3] PHP Script Server About to FORK Child Process DEBUG: SS[3] PHP Script Server Child FORK Success ERROR: SS[3] Script Server did not start properly return message was: DEBUG: SS[4] PHP Script Server Routine Starting DEBUG: SS[4] PHP Script Server About to FORK Child Process DEBUG: SS[4] PHP Script Server Child FORK Success ERROR: SS[4] Script Server did not start properly return message was: NOTE: Spine will support multithread device polling. DEBUG: Initial Value of Active Threads is 0 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 1, Pending is 1 DEBUG: Valid Thread to be Created SPINE: Active Threads is 2, Pending is 2 DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 3, Pending is 3 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 4, Pending is 4 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 5, Pending is 5 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 6, Pending is 6 Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] SNMP Result: Device responded to SNMP Updating Full System Information Table Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] TH[2] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[2] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[3] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[3] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[1] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[4] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[4] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[1] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[1] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[6] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[5] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[5] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417536055 Device[65140] TH[2] Total Time: 0.016 Seconds Device[65140] TH[3] Total Time: 0.016 Seconds Device[65140] TH[1] NOTE: There are '1' Polling Items for this Device Device[65140] TH[2] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 5 for Device ID 65140 POLLR: Active Threads is 5, Pending is 5 Device[65140] TH[4] Total Time: 0.021 Seconds Device[65140] TH[3] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 4 for Device ID 65140 POLLR: Active Threads is 4, Pending is 4 Device[65140] TH[5] Total Time: 0.021 Seconds Device[65140] TH[6] Total Time: 0.022 Seconds Device[65140] TH[5] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 3 for Device ID 65140 POLLR: Active Threads is 3, Pending is 3 Device[65140] TH[4] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 2 for Device ID 65140 POLLR: Active Threads is 2, Pending is 2 Device[65140] TH[6] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 1 for Device ID 65140 POLLR: Active Threads is 1, Pending is 1 WARNING: Invalid Response, Device[65140] TH[1] DS[1062011] SCRIPT: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: Device[65140] TH[1] DS[1062011] SS[0] SERVER: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: U Device[65140] TH[1] Total Time: 0.042 Seconds Device[65140] TH[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 0 for Device ID 65140 POLLR: Active Threads is 0, Pending is 0 SPINE: The Final Value of Threads is 0 DEBUG: Thread Cleanup Complete DEBUG: SS[0] Script Server Shutdown Started DEBUG: SS[1] Script Server Shutdown Started DEBUG: SS[2] Script Server Shutdown Started DEBUG: SS[3] Script Server Shutdown Started DEBUG: SS[4] Script Server Shutdown Started DEBUG: PHP Script Server Pipes Closed DEBUG: Allocated Variable Memory Freed DEBUG: MYSQL Free & Close Completed DEBUG: Net-SNMP Close Completed Time: 1.1891 s, Threads: 40, Devices: 1
So, from what i see here, the behavior is slightly different :
ERROR: SS[4] Script Server did not start properly return message was:
WARNING: Invalid Response, Device[65140] TH[1] DS[1062011] SCRIPT: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output:
I then increased the number of data sources one by one until i start to get output other than 'U', only enabling thoses based on php scripts using the script_server. I started to have data showing up like in the initial posting after 4 data sources (7 items in poller cache), here is the corresponding spine output :
/usr/local/spine/bin/spine --mode online --poller 12 --hostlist=65140 --mibs -C /usr/local/spine/etc/spine.conf -R -S -V 5
SPINE: Using spine config file [/usr/local/spine/etc/spine.conf] 2019/12/24 17:53:01 - SPINE: Poller[12] DEBUG: The path_php_server variable is /var/www/cacti/script_server.php 2019/12/24 17:53:01 - SPINE: Poller[12] DEBUG: The path_cactilog variable is /var/log/cacti/cacti.log DEBUG: The log_destination variable is 4 (STDOUT) DEBUG: The path_php variable is /opt/rh/rh-php70/root/bin/php DEBUG: The availability_method variable is 2 DEBUG: The ping_recovery_count variable is 3 DEBUG: The ping_failure_count variable is 2 DEBUG: The ping_method variable is 1 DEBUG: The ping_retries variable is 1 DEBUG: The ping_timeout variable is 600 DEBUG: The snmp_retries variable is 1 DEBUG: The log_perror variable is 1 DEBUG: The log_pwarn variable is 0 DEBUG: The boost_redirect variable is 1 DEBUG: The boost_rrd_update_enable variable is 1 DEBUG: The log_pstats variable is 0 DEBUG: The threads variable is 40 DEBUG: The polling interval is 300 seconds DEBUG: The number of concurrent processes is 50 DEBUG: The script timeout is 20 DEBUG: The selective_device_debug variable is 46480,65178,46420,65140 DEBUG: The spine_log_level variable is 2 DEBUG: The number of php script servers to run is 5 DEBUG: Device List to be polled='65140', TotalPHPScripts='1 DEBUG: The PHP Script Server is Required DEBUG: The Maximum SNMP OID Get Size is 20 Selective Debug Devices 46480,65178,46420,65140 Version 1.1.38 starting DEBUG: MySQL is Thread Safe! DEBUG: Spine is running asroot. SPINE: Initializing Net-SNMP API DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP. SPINE: Initializing PHP Script Server(s) DEBUG: SS[0] PHP Script Server Routine Starting DEBUG: SS[0] PHP Script Server About to FORK Child Process DEBUG: SS[0] PHP Script Server Child FORK Success ERROR: SS[0] Script Server did not start properly return message was: DEBUG: SS[1] PHP Script Server Routine Starting DEBUG: SS[1] PHP Script Server About to FORK Child Process DEBUG: SS[1] PHP Script Server Child FORK Success ERROR: SS[1] Script Server did not start properly return message was: DEBUG: SS[2] PHP Script Server Routine Starting DEBUG: SS[2] PHP Script Server About to FORK Child Process DEBUG: SS[2] PHP Script Server Child FORK Success ERROR: SS[2] Script Server did not start properly return message was: DEBUG: SS[3] PHP Script Server Routine Starting DEBUG: SS[3] PHP Script Server About to FORK Child Process DEBUG: SS[3] PHP Script Server Child FORK Success ERROR: SS[3] Script Server did not start properly return message was: DEBUG: SS[4] PHP Script Server Routine Starting DEBUG: SS[4] PHP Script Server About to FORK Child Process DEBUG: SS[4] PHP Script Server Child FORK Success ERROR: SS[4] Script Server did not start properly return message was: NOTE: Spine will support multithread device polling. DEBUG: Initial Value of Active Threads is 0 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 1, Pending is 1 DEBUG: Valid Thread to be Created SPINE: Active Threads is 2, Pending is 2 DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 3, Pending is 3 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 4, Pending is 4 DEBUG: Valid Thread to be Created SPINE: Active Threads is 5, Pending is 5 DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 DEBUG: Valid Thread to be Created DEBUG: In Poller, About to Start Polling of Device for Device ID 65140 SPINE: Active Threads is 6, Pending is 6 Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] SNMP Result: Device responded to SNMP Updating Full System Information Table Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] DEBUG: Entering SNMP Ping Device[65140] TH[5] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[2] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[5] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[2] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[2] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[3] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[4] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[1] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[1] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[4] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] RECACHE: Processing 8 items in the auto reindex cache for 'custom-c904.custom.domain Device[65140] TH[6] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[8] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[9] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[17] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[18] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[65] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[73] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[6] Recache DataQuery[75] OID: .1.3.6.1.2.1.1.3.0, output: 417667753 Device[65140] TH[5] Total Time: 0.013 Seconds Device[65140] TH[2] NOTE: There are '2' Polling Items for this Device Device[65140] TH[3] NOTE: There are '2' Polling Items for this Device Device[65140] TH[1] NOTE: There are '2' Polling Items for this Device Device[65140] TH[4] NOTE: There are '1' Polling Items for this Device Device[65140] TH[5] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 5 for Device ID 65140 POLLR: Active Threads is 5, Pending is 5 Device[65140] TH[6] Total Time: 0.018 Seconds Device[65140] TH[6] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 4 for Device ID 65140 POLLR: Active Threads is 4, Pending is 4 WARNING: Invalid Response, Device[65140] TH[2] DS[1061996] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '3', output: Device[65140] TH[2] DS[1061996] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '3', output: U WARNING: Invalid Response, Device[65140] TH[3] DS[1062000] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '7', output: Device[65140] TH[3] DS[1062000] SS[1] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '7', output: U Device[65140] TH[3] DS[1062000] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'total' '7', output: 88438829056 Device[65140] TH[3] Total Time: 0.068 Seconds Device[65140] TH[3] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 3 for Device ID 65140 POLLR: Active Threads is 3, Pending is 3 WARNING: Invalid Response, Device[65140] TH[1] DS[1061994] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '1', output: Device[65140] TH[1] DS[1061994] SS[2] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'used' '1', output: U Device[65140] TH[1] DS[1061994] SS[1] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'total' '1', output: 148907360256 Device[65140] TH[1] Total Time: 0.081 Seconds Device[65140] TH[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 2 for Device ID 65140 POLLR: Active Threads is 2, Pending is 2 WARNING: Invalid Response, Device[65140] TH[4] DS[1062011] SCRIPT: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: Device[65140] TH[4] DS[1062011] SS[3] SERVER: /var/www/cacti/scripts/ss_fping.php ss_fping 'custom-c904.custom.domain' '20' 'ICMP' '', output: U Device[65140] TH[4] Total Time: 0.13 Seconds Device[65140] TH[4] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 1 for Device ID 65140 POLLR: Active Threads is 1, Pending is 1 WARNING: Invalid Response, Device[65140] TH[2] DS[1061996] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'total' '3', output: Device[65140] TH[2] DS[1061996] SS[4] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'custom-c904.custom.domain' '65140' '2:161:10000:1:20:mycommunity::::::' 'get' 'total' '3', output: U Device[65140] TH[2] Total Time: 0.14 Seconds Device[65140] TH[2] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function DEBUG: The Value of Active Threads is 0 for Device ID 65140 POLLR: Active Threads is 0, Pending is 0 SPINE: The Final Value of Threads is 0 DEBUG: Thread Cleanup Complete DEBUG: SS[0] Script Server Shutdown Started DEBUG: SS[1] Script Server Shutdown Started DEBUG: SS[2] Script Server Shutdown Started DEBUG: SS[3] Script Server Shutdown Started DEBUG: SS[4] Script Server Shutdown Started DEBUG: PHP Script Server Pipes Closed DEBUG: Allocated Variable Memory Freed DEBUG: MYSQL Free & Close Completed DEBUG: Net-SNMP Close Completed Time: 1.1936 s, Threads: 40, Devices: 1
So, from what i can see here, if a php script running in script server have its output empty for some reason, can it get out of alignment with the poller ? Also, important note, every tests in those logs run properly when launched manually from script_server.php.
Edit your php.ini file, and set the error_log = /tmp/php_errors.log and then watch it. The script server MUST start properly if things are to work well. So, get that issue resolved right away.
I changed my current php configuration regarding errors for this one :
/etc/opt/rh/rh-php70/php.ini:error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT
/etc/opt/rh/rh-php70/php.ini:display_errors = On
/etc/opt/rh/rh-php70/php.ini:display_startup_errors = On
/etc/opt/rh/rh-php70/php.ini:log_errors = On
/etc/opt/rh/rh-php70/php.ini:log_errors_max_len = 1024
/etc/opt/rh/rh-php70/php.ini:ignore_repeated_errors = Off
/etc/opt/rh/rh-php70/php.ini:track_errors = On
/etc/opt/rh/rh-php70/php.ini:html_errors = On
/etc/opt/rh/rh-php70/php-fpm.conf:error_log = /var/opt/rh/rh-php70/log/php-fpm/error.log
/etc/opt/rh/rh-php70/php-fpm.d/www.conf:php_admin_value[error_log] = /var/opt/rh/rh-php70/log/php-fpm/www-error.log
/etc/opt/rh/rh-php70/php-fpm.d/www.conf:php_admin_flag[log_errors] = on
The www-error.log remains empty after running spine manually like before, and the error_log dont displays errors : /var/opt/rh/rh-php70/log/php-fpm/error.log
[26-Dec-2019 10:20:03] NOTICE: Terminating ...
[26-Dec-2019 10:20:03] NOTICE: exiting, bye-bye!
[26-Dec-2019 10:20:03] NOTICE: fpm is running, pid 26433
[26-Dec-2019 10:20:03] NOTICE: ready to handle connections
[26-Dec-2019 10:20:03] NOTICE: systemd monitor interval set to 10000ms
If you confirm the issue is not spine related, do you want me to close it and reopen to cacti ?
A complete reinstallation of cacti, and recompilation/reinstallation of spine from 1.1.38 sources didn't solved the problem on a new fresh poller and device ids.
I still have nothing in php_errors/www-error logs.
Let's try a few debug tests. Here they are:
1) For the device set host threads = 1, save the host and see if the problem continues
2) Set php script servers to 1 for the Data Collector, and see if the problem continues
3) Disable ss_fping.php data source and see if the problem resolves itself.
Lastly, are you using the HMIB plugin?
I applied steps after steps :
HMIB plugin is not used.
I retried multiples times, i removed datasources not based on php script server, and then data sources based on php script server other than ss_host_disk.php, i also did the same with data sources based on ss_host_cpu.php, and here is what i noticed :
For example, in this case (5 php pollers = 5 WARNING) :
DEBUG: The number of php script servers to run is 5
[...]
Device[65695] TH[1] NOTE: There are '18' Polling Items for this Device
WARNING: Invalid Response, Device[65695] TH[1] DS[1074627] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '1', output:
Device[65695] TH[1] DS[1074627] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '1', output: U
WARNING: Invalid Response, Device[65695] TH[1] DS[1074627] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'total' '1', output:
Device[65695] TH[1] DS[1074627] SS[1] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'total' '1', output: U
WARNING: Invalid Response, Device[65695] TH[1] DS[1074628] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '2', output:
Device[65695] TH[1] DS[1074628] SS[2] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '2', output: U
WARNING: Invalid Response, Device[65695] TH[1] DS[1074628] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'total' '2', output:
Device[65695] TH[1] DS[1074628] SS[3] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'total' '2', output: U
WARNING: Invalid Response, Device[65695] TH[1] DS[1074629] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '3', output:
Device[65695] TH[1] DS[1074629] SS[4] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '3', output: U
And in this case (1 php poller = 1 WARNING) :
DEBUG: The number of php script servers to run is 1
[...]
Device[65695] TH[1] NOTE: There are '18' Polling Items for this Device
WARNING: Invalid Response, Device[65695] TH[1] DS[1074627] SCRIPT: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '1', output:
Device[65695] TH[1] DS[1074627] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '1', output: U
Device[65695] TH[1] DS[1074627] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'total' '1', output: 62415646720
Device[65695] TH[1] DS[1074628] SS[0] SERVER: /var/www/cacti/scripts/ss_host_disk.php ss_host_disk 'hostname' '65695' '2:161:2000:1:20:community::::::' 'get' 'used' '2', output: 214759895040
I also noticed when there is more php data sources to poll, this tend to 'shift' results.
Here are full logs : cacti_all_steps_storage.txt cacti_steps_1_3_storage.txt cacti_all_steps_cpu_storage.txt
One of the nasty things with Windows devices is that when you have a removable devices like a USB stick, Windows screws up the indexing every time you insert or removed that device. This is why I always recommend the HMIB plugin if you have Windows devices. It's just because of it's defective SNMP agent. I'm not sure this is related, but it's a suspicion.
From what i understood and observed earlier, this issue is affecting us on everything using php script server, it doesn't comes from the data windows displays in snmp, since they are working fine manually.
I can somehow reproduce the problem with the cmd.php (which is easier to edit to debug for me), thought it's the only output i get :
2020/01/16 14:41:28 - POLLER: Poller[13] WARNING: Invalid Response(s), Errors[1] Device[65695] Thread[1] DS[1074627]
I modified a bit of its output to trace what's going on in there, here are the parts of the code i changed for debugging :
705 case POLLER_ACTION_SCRIPT_PHP: /* script (php script server) */
706 $output = trim(str_replace("\n", '', exec_poll_php($item['arg1'], $using_proc_function, $pipes, $cactiphp)));
707 cacti_log("output[$output] ds[$data_source]","");
708 if (!is_numeric($output)) {
709 cacti_log("is_not_numeric[$output]","");
710 if (prepare_validate_result($output) == false) {
711 if (strlen($output) > 20) {
712 $strout = 20;
713 } else {
714 $strout = strlen($output);
715 }
716
717 $error_ds[$data_source] = $data_source;
**[...]**
454 cacti_log('WARNING: Invalid Response(s) (1), Errors[' . sizeof($error_ds) . '] Device[' . $last_host . '] Thread[1] DS[' . implode(', ', $error_ds) . ']', false, 'POLLER');
[...]
769 cacti_log('WARNING: Invalid Response(s) (2), Errors[' . sizeof($error_ds) . '] Device[' . $last_host . '] Thread[1] DS[' . implode(', ', $error_ds) . ']', false, 'POLLER');
And here are the logs i get after running cmd.php manually :
2020/01/16 14:53:27 - CMDPHP output[PHP Script Server has Started - Parent is cmd] ds[1074627]
2020/01/16 14:53:27 - CMDPHP is_not_numeric[PHP Script Server has Started - Parent is cmd]
2020/01/16 14:53:27 - CMDPHP output[62453669888] ds[1074627]
2020/01/16 14:53:27 - CMDPHP output[214759895040] ds[1074628]
2020/01/16 14:53:27 - CMDPHP output[183188824064] ds[1074628]
2020/01/16 14:53:27 - CMDPHP output[384808513536] ds[1074629]
2020/01/16 14:53:27 - CMDPHP output[88438829056] ds[1074629]
2020/01/16 14:53:27 - CMDPHP output[1199111663616] ds[1074630]
2020/01/16 14:53:27 - CMDPHP output[153237323776] ds[1074630]
2020/01/16 14:53:27 - CMDPHP output[429493518336] ds[1074631]
2020/01/16 14:53:27 - CMDPHP output[312297848832] ds[1074631]
2020/01/16 14:53:27 - CMDPHP output[429493518336] ds[1074632]
2020/01/16 14:53:27 - CMDPHP output[53145567232] ds[1074632]
2020/01/16 14:53:27 - CMDPHP output[214745153536] ds[1074633]
2020/01/16 14:53:27 - CMDPHP output[148907360256] ds[1074633]
2020/01/16 14:53:27 - CMDPHP output[214745153536] ds[1074634]
2020/01/16 14:53:27 - CMDPHP output[31986352128] ds[1074634]
2020/01/16 14:53:27 - CMDPHP output[34359148544] ds[1074635]
2020/01/16 14:53:27 - CMDPHP output[34916925440] ds[1074635]
2020/01/16 14:53:27 - POLLER: Poller[13] WARNING: Invalid Response(s) (2), Errors[1] Device[65695] Thread[1] DS[1074627]
From what i read here, data source 1074627 (thought cacti get a correct reading in the next iteration of the loop) gets broken because of script_server.php disclaimer message, and is then set as in error in $error_ds (717), which leads to this invalid response log.
Is this an expected behaviour ?
I dont know how close is the cmd.php to spine in its inner mechanisms, i modified it to try to give you more insight about this issue.
I believe the php script server shouldn't return "PHP Script Server has Started - Parent is cmd" when called if possible, or the output validating regarding this should be more strict.
That should really only be seen in a debug mode I thought say, but seems to be related to running cacti on windows for sure. Likely it was added in to diagnose something and has been a left over though in theory, script server should ignore all output that doesn't conform IMHO, @cigamit or @TheWitness may have thoughts on that as they have both been running stuff a lot longer than I have.
Hi,
Commenting the lines that output "PHP Script Server has Started - [...]" in the script_server.php solves my problem. Now every graph is as it should be!
That is a red herring. I'm going to close this one.
Describe the bug Spine seems to be swapping results of tests for a same device in a single polling, it ends up on graphs with erratic data, or empty graph if data cant fit in.
To Reproduce Not sure how to reproduce, this problems happens with 3 graphs templates on our side for the poller i checked. Since our monitoring system is quite big (35k devices), it may also exists somewhere else.
Expected behavior Spine shouldn't swap results.
Screenshots
Server (please complete the following information):
Compiling (please complete the following information):
Additional context Add any other context about the problem here including logs and command line output
Here is the spine command i played to identify the problem and its output, i added an HERE to lines that shows the problem. /usr/local/spine/bin/spine --mode online --poller 12 --hostlist=65140 --mibs -C /usr/local/spine/etc/spine.conf -R -S
Dont hesitate to ask further informations if needed !