Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.65k stars 406 forks source link

Error on interpretation of snmpUtime, when to big #4428

Closed arno-st closed 3 years ago

arno-st commented 3 years ago

On a device when uptime is to long (more than 224 days I think) the device is seen as down, if the monitoring is set to snmpUptime and ping.

As an exemple I have this: Uptime: 4190934469 (485days, 1hours, 29minutes) and device is down. ON 1.2.18 I have this: sre-core 10.0.2.26 3159 28 29 Down 2d:23h:56m N/A  
And on 1.2.17 sre-core 10.0.2.26 363 37 38 Up 226d:23h:49m 485d:1h:55m  
netniV commented 3 years ago

This may be the difference between 32-bit and 64-bit uptime counters. Uptime is recorded in seconds if I remember rightly as I used to have this issue on older 32-bit hardware making Cacti think it had rebooted when it hadn't.

What is your OS/Arch for all involved systems?

arno-st commented 3 years ago

My 1.2.18 run on centos 10.0-1160.42.2.el7.x86_64

My 1.2.17 run on centos 10.0-1160.42.2.el7.x86_64

The only point can find is that on has PHP 7.4.2 on the 1.2.17 version, the other one has 7.4.14 where cacti 1.2.18 run.

Otherwise, it should be same same!

As for the client I'm polling there are both the same device Cisco Switch

netniV commented 3 years ago

What about the SNMP libraries? Are you using php-snmp or net-snmp?

arno-st commented 3 years ago

php-snmp both same version Name : php-snmp Arch : x86_64 Version : 7.4.24 Release : 1.el7.remi

arno-st commented 3 years ago

Don't think the problem is on the polling part. since the DB give me both: cacti 1.2.17 4165077143 cacti 1.2.18 4192683188

Mineur difference.

TheWitness commented 3 years ago

I think we need to move that column to a bigint later tonight my time unless @netniV want's to hammer it out.

TheWitness commented 3 years ago

Okay, this is resolved for the 1.2.19 release. You can just hand run the two SQL alters at the bottom of the 1_2_19.php file if you want to hack it in.

TheWitness commented 3 years ago

Thanks for keeping your eye on the ball.

arno-st commented 3 years ago

Can you reopen it, it dosen't solve the problem. I'm working on it to find what's wrong.

Here is the ouput of my DB, the first 2 record are in SNMP v3 and are show as DOWN, the last one is in SNMP V2, and see as UP. I'm looking on that track to see if it's a SNMP version problem. "id","poller_id","site_id","host_template_id","description","hostname","location","notes","external_id","snmp_community","snmp_version","snmp_username","snmp_password","snmp_auth_protocol","snmp_priv_passphrase","snmp_priv_protocol","snmp_context","snmp_engine_id","snmp_port","snmp_timeout","snmp_sysDescr","snmp_sysObjectID","snmp_sysUpTimeInstance","snmp_sysContact","snmp_sysName","snmp_sysLocation","availability_method","ping_method","ping_port","ping_timeout","ping_retries","max_oids","bulk_walk_size","device_threads","deleted","disabled","monitor","monitor_text","monitor_criticality","monitor_warn","monitor_alert","thold_send_email","thold_host_email","status","status_event_count","status_fail_date","status_rec_date","status_last_error","min_time","max_time","cur_time","avg_time","polling_time","total_polls","failed_polls","availability","last_updated","serial_no","model","isPhone","keep_mac_track","password","console_type","can_be_upgraded","can_be_rebooted","do_backup","login","mode"

"3159","1","3","8","core","10.0.2.26",," EZV: En service","SR02129 SR02128 ","telvlsn","3","SNMP_USER","SNMP_KEY","SHA","SNMP_KEY","AES128",,,"161","500","Cisco IOS Software [Fuji], Catalyst L3 Switch Software (CAT9K_IOSXE), Version 16.9.5, RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2020 by Cisco Systems, Inc. Compiled Thu 30-Jan-20 18:53 by mcpre","iso.3.6.1.4.1.9.1.2593","4165077143",,"CORE","LOCATION","1","1","23","400","1","10","-1","1",,,,,"0","0","0","1","0","1","5670","2021-10-15 10:06:02","2021-10-13 15:13:02","Device responded to SNMP, ICMP: Destination address not specified","0.00000","88.03201","0.00000","0.74397","0.027","795147","8664","98.91040","2021-10-19 08:35:02","FCW2211A0AT FCW2211A0B8","C9500-16X","of","of","SNMP_KEY","1","off","off","on","PASSWORD","bundle"

"3160","1","3","7","vbb","10.1.128.10",,,,"telvlsn","3","SNMP_USER","SNMP_KEY","SHA","SNMP_KEY","AES128",,,"161","500","Cisco IOS Software, IOS-XE Software, Catalyst 4500 L3 Switch Software (cat4500e-UNIVERSALK9-M), Version 03.08.08.E RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2019 by Cisco Systems, Inc. Compiled Fri 1","iso.3.6.1.4.1.9.1.1732","2013484542","SOI - Telecom","VBB.recolte.lausanne.ch","LOCATION","1","1","23","400","1","10","-1","1",,,,,"0","0","0","1","0","1","11080","2021-10-11 15:56:02","2020-06-20 08:27:00","Device responded to SNMP, ICMP: Destination address not specified","0.03791","995.94558","1.81699","6.07636","0.034","795146","11232","98.58740","2021-10-19 08:35:02","JAE17430H3B JAE17430BD6","WS-C4500X-32","of","of","SNMP_KEY","1","off","off","on","PASSWORD","bundle"

"4515","1","3","9","pdp","10.128.1.41",," EZV: En service","SR01288 ","telvlsn","2",,,,,,,,"161","500","Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 5.1(3)N2(1b), RELEASE SOFTWARE Copyright (c) 2002-2011 by Cisco Systems, Inc. Device Manager Version 5.2(1), Compiled 8/31/2012 17:00:00","iso.3.6.1.4.1.9.12.3.1.3.1084","1643546154","","PDP","LOCATION","4","1","23","400","1","10","-1","1",,,"on",,"0","0","0","1","0","3","0","2021-08-04 18:05:05","2021-08-04 18:19:02","Device did not respond to SNMP, ICMP: Ping timed out","0.00000","55.65691","0.00000","1.20858","0.046","371268","14","99.99620","2021-10-19 08:35:02","SSI154300JT","N5K-C5548UP",,"of","UNAW2m3sFF+9uSzZf","1","off","off","off","PASSWORD","bundle"

TheWitness commented 3 years ago

Using spine or cmd.php? Problem could be spine at this point.

arno-st commented 3 years ago

I'm using spine, and I'm looking into it. Don't know yet, I have tu understand how the status is check. how can I debug spine ? Does it affect all poller, or can I run it once for a specific device ?

arno-st commented 3 years ago

So here is the output of spine: ./spine -S -H 3160 --verbosity=5 --conf=../etc/spine.conf SPINE: Using spine config file [../etc/spine.conf] Total[0.0064] DEBUG: The path_php_server variable is /usr/share/cacti/script_server.php Total[0.0065] DEBUG: The path_cactilog variable is /usr/share/cacti/log/cacti.log Total[0.0065] DEBUG: The log_destination variable is 4 (STDOUT) Total[0.0067] DEBUG: The path_php variable is /bin/php Total[0.0069] DEBUG: The availability_method variable is 4 Total[0.0070] DEBUG: The ping_recovery_count variable is 6 Total[0.0071] DEBUG: The ping_failure_count variable is 4 Total[0.0072] DEBUG: The ping_method variable is 1 Total[0.0073] DEBUG: The ping_retries variable is 1 Total[0.0074] DEBUG: The ping_timeout variable is 400 Total[0.0074] DEBUG: The snmp_retries variable is 3 Total[0.0075] DEBUG: The log_perror variable is 1 Total[0.0076] DEBUG: The log_pwarn variable is 1 Total[0.0077] DEBUG: The boost_redirect variable is 1 Total[0.0078] DEBUG: The boost_rrd_update_enable variable is 0 Total[0.0079] DEBUG: The log_pstats variable is 1 Total[0.0080] DEBUG: The threads variable is 13 Total[0.0081] DEBUG: The polling interval is 60 seconds Total[0.0082] DEBUG: The number of concurrent processes is 2 Total[0.0083] DEBUG: The script timeout is 25 Total[0.0084] DEBUG: The selective_device_debug variable is 3160,3159 Total[0.0085] DEBUG: The spine_log_level variable is 0 Total[0.0086] DEBUG: The number of php script servers to run is 5 Total[0.0087] DEBUG: Device List to be polled='3160', TotalPHPScripts='1' Total[0.0087] DEBUG: The PHP Script Server is Required Total[0.0088] DEBUG: The Maximum SNMP OID Get Size is 10 Total[0.0088] DEBUG: Selective Debug Devices 3160,3159 Total[0.0090] DEBUG: Total Connections made 1 Total[0.0090] DEBUG: Creating Local Connection Pool of 13 threads. Total[0.0090] DEBUG: Creating Local Connection 0. Total[0.0092] DEBUG: Total Connections made 2 Total[0.0096] DEBUG: Creating Local Connection 1. Total[0.0098] DEBUG: Total Connections made 3 Total[0.0101] DEBUG: Creating Local Connection 2. Total[0.0103] DEBUG: Total Connections made 4 Total[0.0107] DEBUG: Creating Local Connection 3. Total[0.0109] DEBUG: Total Connections made 5 Total[0.0113] DEBUG: Creating Local Connection 4. Total[0.0115] DEBUG: Total Connections made 6 Total[0.0119] DEBUG: Creating Local Connection 5. Total[0.0120] DEBUG: Total Connections made 7 Total[0.0124] DEBUG: Creating Local Connection 6. Total[0.0126] DEBUG: Total Connections made 8 Total[0.0130] DEBUG: Creating Local Connection 7. Total[0.0132] DEBUG: Total Connections made 9 Total[0.0136] DEBUG: Creating Local Connection 8. Total[0.0138] DEBUG: Total Connections made 10 Total[0.0142] DEBUG: Creating Local Connection 9. Total[0.0143] DEBUG: Total Connections made 11 Total[0.0147] DEBUG: Creating Local Connection 10. Total[0.0149] DEBUG: Total Connections made 12 Total[0.0153] DEBUG: Creating Local Connection 11. Total[0.0155] DEBUG: Total Connections made 13 Total[0.0159] DEBUG: Creating Local Connection 12. Total[0.0161] DEBUG: Total Connections made 14 Total[0.0166] DEBUG: Version 1.2.18 starting Total[0.0166] DEBUG: MySQL is Thread Safe! Total[0.0166] DEBUG: Spine running as 0 UID, 0 EUID Total[0.0167] DEBUG: Spine is running as root. Total[0.0167] DEBUG: Spine has got ICMP Total[0.0167] DEBUG: Initializing Net-SNMP API Total[0.0167] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP. Total[0.0180] DEBUG: Initializing PHP Script Server(s) Total[0.0180] DEBUG: SS[0] PHP Script Server Routine Starting Total[0.0180] DEBUG: SS[0] PHP Script Server About to FORK Child Process Total[0.0185] DEBUG: SS[0] PHP Script Server Child FORK Success Total[0.1668] DEBUG: SS[0] Confirmed PHP Script Server running using readfd[20], writefd[19] Total[0.1668] DEBUG: SS[1] PHP Script Server Routine Starting Total[0.1668] DEBUG: SS[1] PHP Script Server About to FORK Child Process Total[0.1670] DEBUG: SS[1] PHP Script Server Child FORK Success Total[0.3097] DEBUG: SS[1] Confirmed PHP Script Server running using readfd[22], writefd[21] Total[0.3097] DEBUG: SS[2] PHP Script Server Routine Starting Total[0.3097] DEBUG: SS[2] PHP Script Server About to FORK Child Process Total[0.3099] DEBUG: SS[2] PHP Script Server Child FORK Success Total[0.4525] DEBUG: SS[2] Confirmed PHP Script Server running using readfd[24], writefd[23] Total[0.4525] DEBUG: SS[3] PHP Script Server Routine Starting Total[0.4525] DEBUG: SS[3] PHP Script Server About to FORK Child Process Total[0.4527] DEBUG: SS[3] PHP Script Server Child FORK Success Total[0.6048] DEBUG: SS[3] Confirmed PHP Script Server running using readfd[26], writefd[25] Total[0.6048] DEBUG: SS[4] PHP Script Server Routine Starting Total[0.6049] DEBUG: SS[4] PHP Script Server About to FORK Child Process Total[0.6050] DEBUG: SS[4] PHP Script Server Child FORK Success Total[0.7482] DEBUG: SS[4] Confirmed PHP Script Server running using readfd[28], writefd[27] Total[0.7492] Spine will support multithread device polling. Total[0.7497] DEBUG: Initial Value of Available Threads is 13 (0 outstanding) Total[0.7499] DEBUG: Valid Thread to be Created Total[0.7500] DEBUG: Available Threads is 12 (1 outstanding) Total[0.7500] DEBUG: In Poller, About to Start Polling of Device for Device ID 0 Total[0.7500] DEBUG: Traversing Local Connection Pool for free connection. Total[0.7500] DEBUG: Checking Local Pool ID 0. Total[0.7500] DEBUG: Allocating Local Pool ID 0. Total[0.7502] DEBUG: Valid Thread to be Created Total[0.7502] DEBUG: Available Threads is 11 (2 outstanding) Total[0.7502] WARNING: Spine Sleeping While Waiting for 2 Threads to End Total[0.7502] DEBUG: In Poller, About to Start Polling of Device for Device ID 3160 Total[0.7503] Device[0] HT[1] Updating Poller Items for Next Poll Total[0.7503] DEBUG: Traversing Local Connection Pool for free connection. Total[0.7503] DEBUG: Checking Local Pool ID 0. Total[0.7503] DEBUG: Checking Local Pool ID 1. Total[0.7503] DEBUG: Allocating Local Pool ID 1. Total[0.7506] Device[0] HT[1] Total Time: 0.00064 Seconds Total[0.7508] get_namebyhost(10.1.128.10) - Allocating name_t Total[0.7508] get_namebyhost(10.1.128.10) - Token #1 Total[0.7508] get_hostbyname(10.1.128.10) - No matching method for 11 chars: 10.1.128.10 Total[0.7508] get_namebyhost(10.1.128.10) - Setting hostname: 10.1.128.10 Total[0.7508] DEBUG: Freeing Local Pool ID 0 Total[0.7508] DEBUG: Device[0] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function Total[0.7509] Device[3160] INFO: SNMP Device '10.1.128.10' has timeout 500000 (500), retries 3 Total[0.7770] Device[3160] IPv4 address 10.1.128.10 (10.1.128.10)

Total[0.7770] Device[3160] DEBUG: Entering ICMP Ping Total[0.7771] WARNING: Spine Sleeping While Waiting for 1 Threads to End Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End Total[0.7778] WARNING: Spine Sleeping While Waiting for 1 Threads to End Total[0.7779] Device[3160] DEBUG: Entering SNMP Ping Total[0.7823] Device[3160] PING Result: ICMP: Destination address not specified Total[0.7823] Device[3160] SNMP Result: Device responded to SNMP

Total[0.7851] Device[3160] HT[1] NOTE: There are '350' Polling Items for this Device Total[0.7852] DEBUG: Setting up writes to local database Total[0.7854] Device[3160] HT[1] Updating Poller Items for Next Poll Total[0.7865] Device[3160] HT[1] Total Time: 0.036 Seconds Total[0.7867] DEBUG: Freeing Local Pool ID 1 Total[0.7867] DEBUG: Device[3160] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function Total[1.2779] The Final Value of Threads is 0 Total[1.2786] DEBUG: Closing Local Connection Pool ID 0 Total[1.2786] DEBUG: Closing Local Connection Pool ID 1 Total[1.2787] DEBUG: Closing Local Connection Pool ID 2 Total[1.2787] DEBUG: Closing Local Connection Pool ID 3 Total[1.2787] DEBUG: Closing Local Connection Pool ID 4 Total[1.2787] DEBUG: Closing Local Connection Pool ID 5 Total[1.2787] DEBUG: Closing Local Connection Pool ID 6 Total[1.2787] DEBUG: Closing Local Connection Pool ID 7 Total[1.2787] DEBUG: Closing Local Connection Pool ID 8 Total[1.2787] DEBUG: Closing Local Connection Pool ID 9 Total[1.2788] DEBUG: Closing Local Connection Pool ID 10 Total[1.2788] DEBUG: Closing Local Connection Pool ID 11 Total[1.2788] DEBUG: Closing Local Connection Pool ID 12 Total[1.2788] DEBUG: Thread Cleanup Complete Total[1.2788] DEBUG: SS[0] Script Server Shutdown Started Total[1.3289] DEBUG: SS[1] Script Server Shutdown Started Total[1.3790] DEBUG: SS[2] Script Server Shutdown Started Total[1.4291] DEBUG: SS[3] Script Server Shutdown Started Total[1.4793] DEBUG: SS[4] Script Server Shutdown Started Total[1.5294] DEBUG: PHP Script Server Pipes Closed Total[1.5294] DEBUG: Allocated Variable Memory Freed Total[1.5294] DEBUG: MYSQL Free & Close Completed Total[1.5295] DEBUG: Net-SNMP Close Completed Total[1.5295] Time: 1.2779 s, Threads: 13, Devices: 2

TheWitness commented 3 years ago

Open a spine bug would you. Cacti is fixed.

TheWitness commented 3 years ago

Let me take that back, spine uses a string. So, it might have something to do with the snmp library. In the mean time. Edit poller.c and make the modification as in the highlighted row below:

image

Then, make spine and run as follows:

./spine -R --mibs --first host_id1 --last host_id2
<snip>
NOTE: The SNMP Uptime was 8536518
NOTE: The SNMP Uptime was 8537397
NOTE: The SNMP Uptime was 8536553
NOTE: The SNMP Uptime was 318654799
NOTE: The SNMP Uptime was 318654799
Time: 2.8684 s, Threads: 4, Devices: 47

Which should show the output above. Let us know if the value is correct there.

TheWitness commented 3 years ago

Continuing this discussion on the spine side.

TheWitness commented 3 years ago

Okay, made a few more GUI changes so that when you edit the device, you also see the correct uptime. Also addressed cmd.php and reindexing there.

arno-st commented 3 years ago

Just wondering where did you change the GUI

TheWitness commented 3 years ago

What do you mean? If you edit the device, it grabs uptime dynamically.

arno-st commented 3 years ago

Ho ok, I where wondering if you add a field. Because the uptime was visible on the snmp Information, and didn't check the device page.