SteScho / manubulon-snmp

Set of Icinga/Nagios plugins to check hosts and hardware with the SNMP protocol.
GNU General Public License v2.0
73 stars 71 forks source link

check_snmp_process.pl hangs until ALARM with -A (arguments) #47

Open ghost opened 6 years ago

ghost commented 6 years ago

I have not yet identified what is causing this but on a specific server the -A causes it to hang and not match any processes. The same check against other servers works fine. We have some java processes on this server with a very long list of arguments and I wonder if that could be to blame.

I'll continue to try to debug further.

ghost commented 6 years ago

I can confirm it is hanging at get_table on line 531:

    $resultat_param
        = (version->parse(Net::SNMP->VERSION) < 4)
        ? $session->get_table($run_param_table)
        : $session->get_table(Baseoid => $run_param_table);
Dalesjo commented 6 years ago

The snmpd service puts out the following in the log when check_snmp_process.pl ran with the -A flag

Jul 31 15:04:47 localhost snmpd[89732]: send response: Too long (plaintext scopedPDU header type 00: s/b 30)
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1025
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1036
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1037
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1038
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1041
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1179
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1180
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1181
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1182
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1185
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1420
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1422
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1425
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1433
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1434
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1438
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1444
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1445
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1446
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1449
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1454
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1462
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1464
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1467
Jul 31 15:04:47 localhost snmpd[89732]:     -- HOST-RESOURCES-MIB::hrSWRunParameters.1469

Coincidentally i also have a java process with a a long list off arguments. in my case the longest path + argument is 784 characters long.

Increasing the octet length with parameter "-o 4096" from default 1472 to 4096 removes the error message from snmpd logs but check_snmp_process.pl still fails with no aswer from host.

Dalesjo commented 6 years ago

Ok i done some more testing. It works if you change the code the user posted above, if you change it to add maxrepetitions to the get_table requets it works. example:

    $resultat
        = (version->parse(Net::SNMP->VERSION) < 4)
        ? $session->get_table($run_name_table)
        : $session->get_table(Baseoid => $run_param_table,maxrepetitions => 10)`

According to Net::SNMP maxrepetitions is automatically calculated if not present. And if I understand correctly it is how many rows [NET::SNMP](https://metacpan.org/pod/Net::SNMP#get_table()-retrieve-a-table-from-the-remote-agent) gets per request. I did some testing to see how long it took with different maxrepetitions. and this is the real execution time accourding to time. I used a 60 timeout on check_snmp_process.pl

maxrepetitions Time
0 17.351s
1 17.386s
2 9.216s
5 4.641s
10 3.214s
20 2.338s
22 2.286s
25 failed
30 failed
40 failed