Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.64k stars 405 forks source link

Allow Bulk Editing of Data Sources and Active Data Source Templates #3230

Open danfiscus opened 4 years ago

danfiscus commented 4 years ago

Is your feature request related to a problem? Please describe. I am using a data source template that has mistakenly classified traffic_in and traffic_out as counter variables instead of gauge variables. This data source is already in use in over 5000 graphs, and I have no way of fixing this.

Describe the solution you'd like I want to be able to modify an active Data Source Template, then apply those changes to all the Data Sources using that template. Or, I would like to be able to edit thousands of templates at once, change the counter to a gauge, and have those changes propagate and save.

Describe alternatives you've considered The only other way that I can think to do this is to delete all of my devices, all of the broken Data Sources, then once it is no longer "In Use", I can modify the Data Source Template and add all my devices back in, at which point the new data sources will be created with the correct types for the traffic gauges.

I also briefly considered pulling the whole SQL server down to a local machine, writing a Python script to manually go through each file and modify the "Counter" to "Gauge" for all 5k data sources, as well as for the template, then upload all the corrected files. This option was ruled out in favor of maintaining my last shred of sanity.

TheWitness commented 4 years ago

Traffic is normally a counter. Why is it a gauge in your case? In the near term, you are better off just making a database edit.

danfiscus commented 4 years ago

The Cacti website says a Counter is for "values that never decrease", and Guage is for "numbers that are not continuously incrementing, e.g. a temperature reading". Why would network traffic be a counter? The speed in bits per second could go up or down constantly. Am I missing something here?

edit: source for those definitions, in case the documentation needs to be changed: https://www.cacti.net/downloads/docs/html/templates.html

cigamit commented 4 years ago

Network Traffic is a counter. Thats because the network device itself isn't returning bits per second. It returns "this is the amount of traffic I have passed since I booted up" not accounting for counter rollovers or service restarts ofcourse. Cacti polls the counter the first time and saves the data internally. Then the next polling cycle, gets the data again. The difference between these 2 numbers, plus the amount of time between polling then gets you the bits per second.

Just to note, this is also the reason why it takes 2 polling cycles before a counter graph will show data.

cigamit commented 4 years ago

To verify, you can also look at this. Look down at "10 ifInOctets" http://www.net-snmp.org/docs/mibs/interfaces.html#ifInOctets

danfiscus commented 4 years ago

Ah, that's good to know, thanks! Unfortunately, now I'm back at square one trying to figure out why nearly all my graphs for network traffic say NaN. The console log still says there's a ton of "incomplete result: U" errors. Where do I go from here?

cigamit commented 4 years ago

I would use the data source troubleshooter on one of the graphs first, to try and determine what is going on.

image

danfiscus commented 4 years ago

Okay, is there somewhere this troubleshooting could continue? I still need help and am able to provide more info about my specific problem, but it doesn't really fit with this feature request thread

Edit: on second thought, this might still be relevant, because the data source debugger says the maximum should be set to U, and sure enough the values being fetched are larger than the current max, the only problem is the current max is U

RRD maximum for Data Source 'traffic_out' should be 'U'
/bin/rrdtool tune /var/www/html/cacti/rra/108/4924.rrd --maximum traffic_in:U
/bin/rrdtool tune /var/www/html/cacti/rra/108/4924.rrd --maximum traffic_out:U

Data Source Max U Data Source Max U Missing Both of those screenshots are from the same data source page

cigamit commented 4 years ago

Oh, something you didn't mention. What Cacti version are you using? Just noticed that your screen shots appear from an older version.

The default value for the max for the default traffic template is normally set to "100000000". "U" should work, as that again means no max.

cigamit commented 4 years ago

I also wouldn't focus on the "Max" just yet, I think you are chasing a red herring.

If the troubleshooter I posted isn't available in your older version of Cacti, you will basically have to chase down each thing manually.

  1. Does the Cacti server show that it receiving real data back from the device in the logs when ran in MEDIUM mode?
  2. What user does the poller run as, and what are the permissions of the rra file?
  3. Is SELinux running?
danfiscus commented 4 years ago

I am running Cacti 1.2.8, the latest stable build. Maybe it looks different because I'm not using the default theme? Not 100% sure

As for the max value, if traffic_in and traffic_out are constantly increasing values that are calculated into bits/sec as you mentioned earlier, why would there be a max at all? Shouldn't the default be U so that it doesn't overflow?

danfiscus commented 4 years ago

I have the DS troubleshooter open now, I'm just waiting for the results to appear, there's a bunch of loading symbols right now. I'll report back with that and check on the other questions in the meantime.

  1. I'm not sure what user the poller runs as, and that field is blank on the troubleshooter. The RRA file permissions should be correct, as I ran the automated restructuring file to organize them and that should set/correct the permissions as it moves the files, right?

  2. SELinux is not running, getenforce returns Disabled

cigamit commented 4 years ago

The troubleshooter will give you all that information. It will take a few polling cycles to finish, but it will check that data is polled for the graph, that the files are all writable, that the data collected is properly stored in the rrd, etc...

danfiscus commented 4 years ago

DS Troubleshooter Output C

cigamit commented 4 years ago

So thats showing that the datasource in Cacti is different than the RRD file that was created. I would recommend changing the datasource in Cacti back to the defaults instead of touching the RRDs myself.

Basically change the data template in Cacti back to the original defaults (I think yours shows as 1,000,000,000) instead of "U".

danfiscus commented 4 years ago

I'm confused, why is it explicitly recommending I set it to "U" then? Doesn't it make more sense that a counter that could easily exceed 1,000,000,000 (as mine have already) should be Unlimited? I changed the data template to U a while ago but it never propagated that modification to the data sources based on the template.

cigamit commented 4 years ago

Well, the debugger tells you to set it to U, because you changed the data template or data source in Cacti to be U. Its just trying to tell you how to match them up. You could change it to U if you like. You would just have to do this across all of them. I find it easier to change it back to the defaults that were originally in the template. The other option is to remove that set of RRDs, and it should recreate them as Cacti says they should be with your "U" in them.

danfiscus commented 4 years ago

Sorry to be a bother but I still don't understand why it's safe to have a maximum value so low on that counter? Isn't that just asking for trouble the minute the counter returns a value above that maximum? Why isn't the maximum for Interface - Traffic set to U by default? Again, sorry for asking this so much but I still don't have a clear answer

cigamit commented 4 years ago

I think what you are missing is that RRDTool doesn't store the counter. It is storing the difference between the counters. So unless your device is sending over 7.45 Gbit/s (the counter is actually in Octets), then the 1,000,000,000 number is just fine.

danfiscus commented 4 years ago

OOhhhhhh, thank you, I understand now!! That said, the network here is 10Gbit/s, so while I doubt that speed will be reached unless there's a ton of unbalanced demand, I do have it set to 10Gbit in the Cacti settings, so that would fix the max if I hadn't manually overwritten it, correct?

cigamit commented 4 years ago

If you want to do 10Gbit/s, then you could crank it up higher or again set it to U. You just have to change the rrd file to match using command it gave you. I would test it on one, and run the debugger again to make sure its happy.

As a side note, you might wonder why all data templates aren't set at U. Its a matter of space. Since RRD Files are fully generated at creation time, it needs to know what size integer to use for the storage portion for each piece of data. There is no point having the RRD file set to use a full 64 bit integer for each piece of data if its only going to be recording small values. So its just recommended that you keep your max values sized appropriately for the type of data you are storing. Otherwise you could have rrd files that are several megs in size when they could be much much smaller. With storage so cheap, you might not think it makes a difference, but it does once you have 200,000 of them.

danfiscus commented 4 years ago

Okay, that does make sense and I appreciate the tip. What would be a good maximum for a 10Gbit network for that value? I certainly don't want to waste the limited storage space on this server if I don't have to. I'm also still not seeing any data, by the way, and the troubleshooter spit out an unusual error message: Debug not completed after 5 pollings Failed fields: last_result, valid_data, rra_timestamp2 RRDfile

danfiscus commented 4 years ago

I just looked through my logs and it seems like my poller is not running properly. It has exceeded the 60 second cycle the last dozen or so times it ran. But the line above that in the log says it finished the run in less than a second. Am I cursed or something?

2020/02/04 16:03:01 - POLLER: Poller[1] Maximum runtime of 58 seconds exceeded. Exiting.
2020/02/04 16:03:01 - SYSTEM WARNING: Primary Admin account notifications disabled!  Unable to send administrative Email.
2020/02/04 16:03:01 - SNMPAGENT WARNING: No notification receivers configured for event: cactiNotifyPollerRuntimeExceeding (CACTI-MIB), severity: high
2020/02/04 16:03:01 - SYSTEM STATS: Time:59.5367 Method:spine Processes:4 Threads:128 Hosts:115 HostsPerProcess:29 DataSources:1797 RRDsProcessed:0
2020/02/04 16:03:01 - POLLER: Poller[1] NOTE: Poller Int: '60', Cron Int: '60', Time Since Last: '59.73', Max Runtime '58', Poller Runs: '1'
2020/02/04 16:03:02 - SYSTEM DSDEBUG STATS: Type:poller, ChecksPerformed:1, TotalIssues:0, Time:0.0672
2020/02/04 16:03:03 - SYSTEM DSSTATS STATS: Type:HOURLY, Time:0.0261 RRDUser:0.0000 RRDSystem:0.0000 RRDReal:0.0000
2020/02/04 16:03:03 - REPORTS Cacti Reports reports found: 0
cigamit commented 4 years ago

You should have to run it debug mode to find out where it is hanging. I would also bring down the number of threads to something reasonable. Like 12. You should have 1 (some people like 2) processes per CPU Core, and then from there tweak the threads up and down. Maybe its spine, you can try swapping to cmd.php also. That would certainly explain the debugger error, and why data isn't be stored in rrds too.

danfiscus commented 4 years ago

Looks to be related to spine, because switching to cmd.php allowed the whole scan to run and finish no problem in 30 seconds, which makes far more sense. And I now have data on the graphs for nearly all 5000 data sources!! Now I just have to sort through the dozens of "Result from SNMP not valid. Partial Result: U" errors in the log. BTW, is there a place where I can send you some coffee/beer money for all your help? Getting to this point of the project has been no small task and almost all the credit goes to you and the @TheWitness

cigamit commented 4 years ago

I say just pay it forward. Once you know Cacti enough, help out in the forums with other people having issues. Everyone starts somewhere. If you feel the need to go beyond that, there is always https://cacti.net/contribute.php

danfiscus commented 4 years ago

I'll be sure to stick around and contribute where/when I can on here! My experience with PHP is limited but once I get this project up and running, I'm planning on writing a troubleshooting guide for the wiki based on my thorough experience of breaking and fixing things in my own Cacti installation.

As for this issue, it's still a feature I'd like to see added, maybe if I find the time, I'll see if it's something I can do, but either way, it wouldn't hurt to leave it open.

I also made a small donation so hopefully that's helpful

netniV commented 4 years ago

We have a documentation section that should cover most common things, though you can always submit a PR to add anything new that you have experienced if you feel others would benefit from it.

https://github.com/cacti/documentation

TheWitness commented 4 years ago

Alive, if barely. ;)