Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.63k stars 404 forks source link

Unable to create graphs due to Data Source verification failure #4250

Closed ssvenn closed 3 years ago

ssvenn commented 3 years ago

I ran into the problem with "Damaged Graph" on 1.2.16 after editing my new templates for Brocade FC switches, and tried to fix it by upgrading to 1.2.17 - this didn't help so i erased the rrd files, reset the database and started from scratch with my exported template

I am seeing strange behaviour where only some of my queried interfaces are being created, most fail with the text "NOTE: Graph not for Data Query and index due to Data Source verification failure.

Rolling back to 1.2.16 and resetting the database again I was able to use the template to create graphs without getting this error message, and spine is able to collect values.

cacti_data_query_brocade_fcmgmt_switch.zip

Screenshot 2021-05-05 at 19 21 51

Steps to reproduce the behavior:

  1. Import the included templates

  2. Create a Brocade FC Switch device

  3. Create graphs from the the Brocade FCMGMT Switch query

ssvenn commented 3 years ago

This might be related to feature#645: Modify automation to test for data before creating graphs

the 64-bit traffic counter on Brocade switches respond with Hex-String data on these OIDs that only work correctly with Spine, the PHP poller seems unable to convert it to numerical values.

snmpwalk -m FCMGMT-MIB -v2c -cpublic 192.168.1.10 .1.3.6.1.3.94.4.5.1.7 -O fn .1.3.6.1.3.94.4.5.1.7.16.0.0.5.51.253.167.72.0.0.0.0.0.0.0.0.1 = Hex-STRING: 00 00 03 AB A0 BA 1D CC .1.3.6.1.3.94.4.5.1.7.16.0.0.5.51.253.167.72.0.0.0.0.0.0.0.0.2 = Hex-STRING: 00 00 05 4A EE 5E 3B 24 .1.3.6.1.3.94.4.5.1.7.16.0.0.5.51.253.167.72.0.0.0.0.0.0.0.0.3 = Hex-STRING: 00 00 02 0D 66 93 01 68

https://forums.cacti.net/viewtopic.php?f=21&t=61663 has some more background

Lejooohn commented 3 years ago

Hi,

After upgrade from 1.2.16 to 1.2.17 i have a similar problem with ucd/net template :

Screenshot_1

Someone have a solution to fix that or we have to wait a next update?

Regards,

xmacan commented 3 years ago

@Lejooohn please show result of snmpwalk: snmpwalk -c YOUR_COMMUNITY HOST_IP_ADDRESS 1.3.6.1.4.1.2021.4

Lejooohn commented 3 years ago

@Lejooohn please show result of snmpwalk: snmpwalk -c YOUR_COMMUNITY HOST_IP_ADDRESS 1.3.6.1.4.1.2021.4

Hello @xmacan ,

Below, the output :

UCD-SNMP-MIB::memIndex.0 = INTEGER: 0 UCD-SNMP-MIB::memErrorName.0 = STRING: swap UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 2047996 kB UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 2047996 kB UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 4041112 kB UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 3264592 kB UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 5312588 kB UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 kB UCD-SNMP-MIB::memShared.0 = INTEGER: 5524 kB UCD-SNMP-MIB::memBuffer.0 = INTEGER: 2392 kB UCD-SNMP-MIB::memCached.0 = INTEGER: 549568 kB UCD-SNMP-MIB::memSwapError.0 = INTEGER: noError(0) UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:

Regards,

John

ssvenn commented 3 years ago

I think a refinement of this data validation feature would be to stop the automatic countdown on that dialog box so that users have more time to act on it, and add a "force create graphs" checkbox somewhere as a workaround for edge cases like mine where Spine polls valid data while PHP does not.

netniV commented 3 years ago

The idea behind the automated countdown is to prevent you from needing to press anything to continue, but in this case it does sound like it needs to be a manual process. Can you paste a screenshot of the exact dialog you had, if it isn't this one?

image

ssvenn commented 3 years ago

Yes it is that dialog box, I just had to cut off the bottom to hide some host names. My environment is currently rolled back to 1.2.16 but I can set up a new instance on 1.2.17 to test more if you need more details.

Validating that a data source produces usable values is definitely a good idea to help users avoid empty graphs and poller errors, but when a graph fails to create it would be good to get a clear and detailed cause. Perhaps just send the text from this dialog box to the log file and some more debug information including the actual polled value and what's wrong with it (empty value, snmp error, invalid data type etc)

netniV commented 3 years ago

Yeah, I think I am going to rope @TheWitness in here to give a comment or two just because I'm not 100% clear yet on how that is working and he'll be able to answer it better than me. Between us, we should be able to come up with a way to do that.

simonpunk commented 3 years ago

@netniV That's possibly the same issue as mind: https://github.com/Cacti/cacti/issues/4246, still no comments tho... Anyway, I think the root cause is the defined template custom data field is used for the mechanism rather than the newly defined custom data field when user trying to create a graph. In Chrome debug session, I see it sends correct post value with my input after clicking the create graph button, but seems the php code looks up the databases input instead.

@ssvenn @Lejooohn I have tried to modify or insert a pre-defined value for my custom data field in databases, and it works, as well as creating another data source template with a pre-defined value for all the data field.

So the scene is like: even you are giving different value for your custom data fields, when you click create button, it doesn't take your input into account but instead the pre-defined input from your data source template. So only when the output returns no 'U' from your data source template will let you keep going.

zuka1337 commented 3 years ago

Same here, 2 diferent types of error when creating any custom graph

First:

<html>
<body>
<!--StartFragment-->

2021-05-13 16:36:53 - CMDPHP PHP ERROR NOTICE Backtrace: (/graphs_new.php[42]:form_save(), /graphs_new.php[150]:html_graph_new_graphs(), /lib/html_graph.php[442]:html_graph_custom_data(), /lib/html_graph.php[573]:draw_nontemplated_fields_graph_item(), /lib/html_form_template.php[252]:draw_edit_form(), /lib/html_form.php[114]:draw_edit_control(), /lib/html_form.php[330]:CactiErrorHandler())
--
2021-05-13 16:36:53 - ERROR PHP NOTICE: Undefined index: value in file: /cacti/lib/html_form.php on line: 330
2021-05-13 16:36:53 - CMDPHP PHP ERROR NOTICE Backtrace: (/graphs_new.php[42]:form_save(), /graphs_new.php[150]:html_graph_new_graphs(), /lib/html_graph.php[442]:html_graph_custom_data(), /lib/html_graph.php[573]:draw_nontemplated_fields_graph_item(), /lib/html_form_template.php[181]:CactiErrorHandler())
2021-05-13 16:36:53 - ERROR PHP NOTICE: Undefined index: color_id in file: /cacti/lib/html_form_template.php on line: 181
2021-05-13 16:36:53 - CMDPHP PHP ERROR NOTICE Backtrace: (/graphs_new.php[42]:form_save(), /graphs_new.php[150]:html_graph_new_graphs(), /lib/html_graph.php[442]:html_graph_custom_data(), /lib/html_graph.php[573]:draw_nontemplated_fields_graph_item(), /lib/html_form_template.php[180]:CactiErrorHandler())
2021-05-13 16:36:53 - ERROR PHP NOTICE: Undefined index: color_id in file: /cacti/lib/html_form_template.php on line: 180

<!--EndFragment-->
</body>
</html>

Second: image

zuka1337 commented 3 years ago

And becomes impossible to add devices by cli with an associated template Device containning graphs that cannot be created by this error:

/usr/bin/php -q /cacti/cli/add_device.php --description=HOSTNAME--ip=IP --community=CM --template=X--version=2 --port=161 --avail=pingsnmp --disable=1

USAGE: snmpget [OPTIONS] AGENT OID [OID]...

  Version:  5.8
  Web:      http://www.net-snmp.org/
  Email:    net-snmp-coders@lists.sourceforge.net

OPTIONS:
  -h, --help            display this help message
  -H                    display configuration file directives understood
  -v 1|2c|3             specifies SNMP version to use
  -V, --version         display package version number
SNMP Version 1 or 2c specific
  -c COMMUNITY          set the community string
SNMP Version 3 specific
  -a PROTOCOL           set authentication protocol (MD5|SHA|SHA-224|SHA-256|SHA-384|SHA-512)
  -A PASSPHRASE         set authentication protocol pass phrase
  -e ENGINE-ID          set security engine ID (e.g. 800000020109840301)
  -E ENGINE-ID          set context engine ID (e.g. 800000020109840301)
  -l LEVEL              set security level (noAuthNoPriv|authNoPriv|authPriv)
  -n CONTEXT            set context name (e.g. bridge1)
  -u USER-NAME          set security name (e.g. bert)
  -x PROTOCOL           set privacy protocol (DES|AES|AES-192|AES-256)
  -X PASSPHRASE         set privacy protocol pass phrase
  -Z BOOTS,TIME         set destination engine boots/time
General communication options
  -r RETRIES            set the number of retries
  -t TIMEOUT            set the request timeout (in seconds)
Debugging
  -d                    dump input/output packets in hexadecimal
  -D[TOKEN[,...]]       turn on debugging output for the specified TOKENs
                           (ALL gives extremely verbose debugging output)
General options
  -m MIB[:...]          load given list of MIBs (ALL loads everything)
  -M DIR[:...]          look in given list of directories for MIBs
    (default: /root/.snmp/mibs:/usr/share/snmp/mibs)
  -P MIBOPTS            Toggle various defaults controlling MIB parsing:
                          u:  allow the use of underlines in MIB symbols
                          c:  disallow the use of "--" to terminate comments
                          d:  save the DESCRIPTIONs of the MIB objects
                          e:  disable errors when MIB symbols conflict
                          w:  enable warnings when MIB symbols conflict
                          W:  enable detailed warnings when MIB symbols conflict
                          R:  replace MIB symbols from latest module
  -O OUTOPTS            Toggle various defaults controlling output display:
                          0:  print leading 0 for single-digit hex characters
                          a:  print all strings in ascii format
                          b:  do not break OID indexes down
                          e:  print enums numerically
                          E:  escape quotes in string indices
                          f:  print full OIDs on output
                          n:  print OIDs numerically
                          p PRECISION:  display floating point values with specified PRECISION (printf format string)
                          q:  quick print for easier parsing
                          Q:  quick print with equal-signs
                          s:  print only last symbolic element of OID
                          S:  print MIB module-id plus last element
                          t:  print timeticks unparsed as numeric integers
                          T:  print human-readable text along with hex strings
                          u:  print OIDs using UCD-style prefix suppression
                          U:  don't print units
                          v:  print values only (not OID = value)
                          x:  print all strings in hex format
                          X:  extended index format
  -I INOPTS             Toggle various defaults controlling input parsing:
                          b:  do best/regex matching to find a MIB node
                          h:  don't apply DISPLAY-HINTs
                          r:  do not check values for range/type legality
                          R:  do random access to OID labels
                          u:  top-level OIDs must have '.' prefix (UCD-style)
                          s SUFFIX:  Append all textual OIDs with SUFFIX before parsing
                          S PREFIX:  Prepend all textual OIDs with PREFIX before parsing
  -L LOGOPTS            Toggle various defaults controlling logging:
                          e:           log to standard error
                          o:           log to standard output
                          n:           don't log at all
                          f file:      log to the specified file
                          s facility:  log to syslog (via the specified facility)

                          (variants)
                          [EON] pri:   log to standard error, output or /dev/null for level 'pri' and above
                          [EON] p1-p2: log to standard error, output or /dev/null for levels 'p1' to 'p2'
                          [FS] pri token:    log to file/syslog for level 'pri' and above
                          [FS] p1-p2 token:  log to file/syslog for levels 'p1' to 'p2'
  -C APPOPTS            Set various application specific behaviours:
                          f:  do not fix errors and retry the request
No hostname specified.

At the end you're still gonna ending up with the device added, but the erros will mess your eyes. The input scripts runs fine manually

Note: Nothing is bad with snmp (I know you're thinking :D)

xmacan commented 3 years ago

@zuka1337 I have similar problem problem Try this: console -> Templates -> Graph -> Edit one of your problematic graphs (Fan or memory heap) -> tick any checkbox (I tried upper limit, it does not matter). Try to create graphs again and let me know.

TheWitness commented 3 years ago

Sorry guys, been taking a break. Needed some time off. Tried getting this remediated over the weekend, but my computer took one look at me and ran for the hills.

TheWitness commented 3 years ago

@zuka1337 that seems like a separate issue to me. If it's not logged already, please log it.

zuka1337 commented 3 years ago

@TheWitness we all need break after some time :D ok I've open another issue #4273

netadmin101 commented 3 years ago

Hello, are there any workarounds for this until a fix is released? I tried what @xmacan suggested with no luck. We add ~5 new devices a day and I'm getting quite a list I need to circle back to with interfaces that need graphs created. Everything else in 1.2.17 seems fine so I hate to revert just for this one thing.

Thanks!

TheWitness commented 3 years ago

You can short circuit the test function and always make it true. It'll bypass the tests. It would be good to get a screen print of the data and graph templates causing issues, and if a script, get a screen shot of the data input method.

TheWitness commented 3 years ago

The function is called test_data_source grep for it.

netadmin101 commented 3 years ago

Thanks for the tip about making that function true. That definitely fixes the problem.

We're just using the standard "Interface - Traffic" data template and "Interface - Traffic (bits/sec, 95th Percentile) graph template. I've attached screenshots of what they all look like, let me know if you want to see any more. I'm happy to help troubleshoot.

data-source-template graph-template-part1 graph-template-part2 graph-template-part3 create-graphs create-graphs-error

UH-Nerion commented 3 years ago

Hello,

I have the same problem. Any workaround? We can not add our graphs.

Thanks

xmacan commented 3 years ago

Try this workaround: In file lib/function.php around the line 1619 is:

function test_data_source($data_template_id, $host_id, $snmp_query_id = 0, $snmp_index = '') {
           global $called_by_script_server;

          $called_by_script_server = true;

Add one line:

function test_data_source($data_template_id, $host_id, $snmp_query_id = 0, $snmp_index = '') {
           global $called_by_script_server;
            return true;
            $called_by_script_server = true;
UH-Nerion commented 3 years ago

Hello,

many thanks. It's working!

netniV commented 3 years ago

Clearly we need to review the test_data_source function to see why it fails so much more than expected.

xmacan commented 3 years ago

@netniV Maybe add button "Test only" to graphs_new.php. Just as it is used for inport templates.

Susanin63 commented 3 years ago

In file lib\functions.php:1753 change:

array($data_template_id, $data_template_id)), to array($data_input['data_template_data_id'], $data_input['data_template_data_id'])),

Susanin63 commented 3 years ago

https://github.com/Cacti/cacti/blob/c4cdb56644ec90738133f46ce3a78fbc4d3c157d/lib/functions.php#L1753

netniV commented 3 years ago

I believe that @Susanin63 is correct, so I have applied that as a fix.

netniV commented 3 years ago

Turns out, that there are still some issues with this as a client has hit this bug so I'm re-openning.

netadmin101 commented 3 years ago

I can confirm, still an issue. I tried @Susanin63 fix on line 1753 and still receive the error when creating graphs. I reverted that change, and I'm back to using the workaround @TheWitness suggested. I've got line 1595 changed to return true instead of false.

I'm happy to test other ideas.

Susanin63 commented 3 years ago

perhaps this is due to the fact that I fixed the error for only one type $data_input['type_id'] == DATA_INPUT_TYPE_SNMP. If soo, we need change

https://github.com/Cacti/cacti/blob/c4cdb56644ec90738133f46ce3a78fbc4d3c157d/lib/functions.php#L1753 https://github.com/Cacti/cacti/blob/c4cdb56644ec90738133f46ce3a78fbc4d3c157d/lib/functions.php#L1808 https://github.com/Cacti/cacti/blob/c4cdb56644ec90738133f46ce3a78fbc4d3c157d/lib/functions.php#L1878

Sorry, I can debug the php script, but I still can't figure out how easy it is to do PR

Susanin63 commented 3 years ago

all times array($data_template_id, $data_template_id) need change to array($data_input['data_template_data_id'], $data_input['data_template_data_id']))

netniV commented 3 years ago

There were indeed several places for patching this. And I had managed to track them down :)

netniV commented 3 years ago

I also suspect that we do not need the template id twice in a few of those as I'm pretty sure it's only on one table, given that no prefix is used in some of the queries. :)

netniV commented 3 years ago

@Susanin63 feel free to give that change a review. Hopefully, that will be the finally nail in the coffin. The issue comes about because the hostname is replaced with a different hostname from the fields that were retrieved via the data_template_id instead of the data_template_data_id.

netniV commented 3 years ago

@ssvenn @Lejooohn @xmacan @simonpunk @zuka1337 @netadmin101 @UH-Nerion

Tagging you all just so that you get a notification in case you aren't subscribed to the thread, please try this latest test_data_source() function.

My advise would be to remove the one from your existing lib/functions.php and copy/paste the new one entirely as I'm not quite sure right now what other changes may affect you but this should make the test work.

netadmin101 commented 3 years ago

@netniV Hmm... no luck here with the new function. I'm still seeing the same thing. Anything specific I can try to narrow it down?

netniV commented 3 years ago

What I ended up doing was writing to the log file, each host, OID and output value that was found wherever the function was querying for SNMP. This helped me realise it was going to the wrong host entirely.

Something like the following should log it in the style of 'DSV Host #123 my_hostname_or_ip, oid .1.3.1.x.x.x.x = [output]`

cacti_log('Host #' . $host['id'] . ' ' . $host['hostname'] . ', oid ' . $oid . ' = ' . json_encode($output), false, 'DSV')

NOTE: this was all written by hand just now as I no longer have the debug in due to committing my changes.

netniV commented 3 years ago

I've just had another quick look at the commit and I can't see an issue with the changes I did make, maybe @Susanin63 can see something I am missing.

netadmin101 commented 3 years ago

Ok... I'm not sure what I did earlier but it's working now. Maybe I copy/pasted the function from the wrong commit by accident. I just tried again and it's working as expected. Sorry about that!

xmacan commented 3 years ago

It seems that it is ok now. I tried few devices, everything without problem. I olny got "Data Source verification failure" message when I really tried the wrong data and it was expected.

netniV commented 3 years ago

I'm not sure how you can try the wrong data, but I'll take your word for it :D

xmacan commented 3 years ago

Please reopen: 2021-06-10 08:17:23 - ERROR PHP NOTICE: Undefined index: snmp_oid in file: /usr/local/share/cacti/lib/functions.php on line: 1777 2021-06-10 08:17:23 - CMDPHP PHP ERROR NOTICE Backtrace: (/graphs_new.php[42]:form_save(), /graphs_new.php[158]:host_new_graphs_save(), /graphs_new.php[243]:create_save_graph(), /lib/template.php[1722]:test_data_sources(), /lib/functions.php[1594]:test_data_source(), /lib/functions.php[1777]:CactiErrorHandler())

This error occurs when you try create graph SNMP - Generic OID Template. Iquess that variable $host['snmp_oid'] is wrong, Unfortunately, I don't know which is correct

I'm not sure how you can try the wrong data, but I'll take your word for it :D You are right, totally wrong :-) I was relying on if (!is_numeric($output)) on line 1779 and I haven't explored what's deeper.

xmacan commented 3 years ago

Howto reproduce: Devices-> choose any device -> create graph for this device -> In Graph Templates choose "SNMP - Generic OID template" -> Create -> insert any OID -> create. You will see error in log (line 1777) and message data source verification failure

Susanin63 commented 3 years ago

@xmacan , I'm not sure if this way of creating graphs will work in even 1.2.16. Without creating a data template - only insert oid ? Try to add graph to device and then create graph. @netniV I tested last revision - all work.

xmacan commented 3 years ago

@Susanin63 - I have tested Generic OID template in 1.2.16 and it is working well. I have tried it on 1.2.17 (older, without test_data_source function) - without problem too.

Susanin63 commented 3 years ago

@xmacan , You're right. The test_data_source function is not designed to process this Generic OID template graph! $host['snmp_oid'] in line 1777 will always be Undefined because data_input_data table don't have any row with value for this data_template_data_id. Need some work on the part of the developers.

Susanin63 commented 3 years ago

Probably we need to serve the value of the column t_value in data_input_data. If it = on, then in $host['snmp_oid'] must be oid from $values of create_save_graph function

netniV commented 3 years ago

Let me see what I can do, pushing my knowledge lately you lot ;-)

netniV commented 3 years ago

OK @Susanin63 and @xmacan give this a try and see what you get.

xmacan commented 3 years ago

@netniV - tested, it is working with generic oid template. In log is only: 2021-06-14 11:59:40 - CMDPHP DSV Backtrace: (/graphs_new.php[42]:form_save(), /graphs_new.php[158]:host_new_graphs_save(), /graphs_new.php[243]:create_save_graph(), /lib/template.php[1724]:test_data_sources(), /lib/functions.php[1599]:test_data_source(), /lib/functions.php[1625]:cacti_debug_backtrace())