Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.63k stars 405 forks source link

Orphaned DataSources impact mysql performance #1641

Closed jpobeda closed 6 years ago

jpobeda commented 6 years ago

Hi,

I've been using a custom data and graph templates to create graphs on Cacti. The reason I use these is because on the same box runs Munin and I wanted to have some of those graphs in Cacti for better reference.

I noticed that every time I create a data source using this data template, it creates a dummy DS as well so when I go to add the graphs I see the ones I created and all these dummy ones as well. I didn't care much before I just deleted them once I'm done BUT Today I realized that when get too many of these dummy data sources (around 50-100) the MySQL process goes to 100% of cpu use and drops down once I delete them.

If you spot something on my templates that would be prevent this to happen, please let me know.

Otherwise I think it could worth to have a look and see how this could be avoided.

Let me know if you need further input

Cacti_Munin templates.zip

jpobeda commented 6 years ago

I'll check disabling Intropage checks..I suspect there might be related to its checks.

netniV commented 6 years ago

Was it?

cigamit commented 6 years ago

Need to see those queries too, just to ensure that we are not missing an index.

jpobeda commented 6 years ago

Was it?

Nope, tested and didn't make any difference. What can I do to add meaningful data for you guys?

netniV commented 6 years ago

Ideally, I would imagine a database backup that can simulate the problem. If you were to do that, I would imagine it best to send it to the developers@cacti.net address due to the private information rather than post it on here. If it's too large to send by email, use a secure upload and send a link.

cigamit commented 6 years ago

That makes sense. You can also enable slow query log in MySQL/MariaDB and set the slow query time to 1 second. You will then get lot's of nice queries in that log.

jpobeda commented 6 years ago

Alright, will do when I get back to work. I did turn slow queries but will have to check query time.

I Will update on next business day :D Thanks

jpobeda commented 6 years ago

Hi there,

MariaDB [(none)]> show variables like '%slow%';
+---------------------+--------------------------------------------------------------------------------------------------------------+
| Variable_name       | Value                                                                                                        |
+---------------------+--------------------------------------------------------------------------------------------------------------+
| log_slow_filter     | admin,filesort,filesort_on_disk,full_join,full_scan,query_cache,query_cache_miss,tmp_table,tmp_table_on_disk |
| log_slow_queries    | ON                                                                                                           |
| log_slow_rate_limit | 1                                                                                                            |
| log_slow_verbosity  |                                                                                                              |
| slow_launch_time    | 2                                                                                                            |
| slow_query_log      | ON                                                                                                           |

I've set time to 1 and restarted MySQL. On the attached log, sorry about timestamp, start looking from 1528 timestamps, around line 212. There is a whole lot of shit from syslog.

The GUI feels particularly slow when navigating Console>Graphs or Data sources, but it gets slow in general. slow - Copy.log

Let me know if find anything.

cigamit commented 6 years ago

How much memory on this server? Can you also post the System Utilities database settings page?

You have a lot of slow queries around a number of things which would be indicative of either your system having too little memory or the database competing with the RRDfile updates.

Also, post the output of the free command and do a du -hs in the rra directory. If this is a real large system, I always say put the database on a separate server and ensure it has sufficient memory. One of your MacTrack tables has over 42m rows, or it could be a bad join there on the dot1x table that is making the whole system slow. Try running an explain on the following query:

SELECT mts.site_name, mtd.device_id, mtd.device_name, mtd.hostname, mtd.mac_address, mtd.username, mtd.ip_address, mtd.dns_hostname, mtd.port_number, mti.ifName, mtd.domain, mtd.status, MAX(mtd.scan_date) as max_scan_date FROM mac_track_dot1x mtd LEFT JOIN mac_track_sites mts ON (mtd.site_id = mts.site_id) LEFT JOIN mac_track_interfaces mti ON (mtd.port_number = mti.ifIndex AND mtd.device_id=mti.device_id) WHERE mtd.site_id=2 GROUP BY device_id, mac_address, port_number, ip_address ORDER BYdevice_nameASC LIMIT 0,30;

That one is returning a number of records, which might be indicative of a bad join, or just a huge set of tables. Nope, I can tell just by looking at it, that the join is wrong on that query. That is what is crapping out your database server and making things slow.

The table mac_track_dot1x table code was contributed by a user who must not have known the correct symantecs of join science.

There is some slowness in some distinct queries around programs. This is likely a 5 minute thing, that could be shrunk to a 5 hour thing to improve overall system performance. The query has to run every so often to freshen the list of programs, but you have over a million. So, maybe it does not have to run so often. Is your Syslog setup in a partitioned fashion? What is the data retention?

Also, logging queries without and index is just making my eyes cross. You should turn that off, then empty the slow query log file, and let it run for a little while and upload again.

There are two improvements that can be made here: 1) Fix the Dot1x queries they are wrong and blowing up your server 2) Run the programs queries our of syslog less often.

Both are not core Cacti. But like I said, turn off logging queries without indexes, purge, and then upload a day or so's worth of queries.

jpobeda commented 6 years ago

Alright, I'll try to answer to the things you've pointed.

First,

 rra # du -hs
925M    .
 rra # free
              total        used        free      shared  buff/cache   available
Mem:        8002008     1504364     1663044      138848     4834600     5955308
Swap:       8388604      325256     8063348

Secondly, I have MT disabled due to a bug that I don't know when/how was introduced. I've got an Issue opened but I couldn't investigate it yet. So, if Mactrack isn't running by the process of elimination it shouldn't be the cause. In addition to the number of records, the problem is that the data retention function does not clean these ones up. I've asked @netniV to give me a hand on that one as well. So I've been cleaning those records every so often (https://github.com/Cacti/plugin_mactrack/issues/42). That's my bad, I should have not pushed those dot1x changes out before getting that done.

That Syslog thing, I've disabled "Enable Statistics Gathering" which I think is the shit that runs all those crazy queries but it looks it's not. Syslog retention is set to 1 year. I've installed Syslog using partitioned tables, I think that's the recommended way, isn't it? I'm pretty sure I tried disabling Syslog plugin + stopping rsyslog service as well.

Attached the Database utility details (would be nice to have an export function tho).

Cacti-Database details.xlsx

It's odd that only starts to happen when I leave orphaned data sources. It gets back to normal after I get rid of them.

I'll turn off queries without indexes and upload again. I think I cover most of your questions, let me know if I overlooked anything.

jpobeda commented 6 years ago

Well I think the problem IS actually syslog... Even after I disabled the plugin AND stopped the rsyslog service, I keep seeing those "syslog_programs" queries...So how's that is still running?

Sorry about the misleading information I took for granted that disabling syslog plugin and rsyslog will stop any syslog related queries to the database. So I now know it's syslog plugin issue I should be looking at ..

That makes me wonder how come I got those mactrack queries running if it's also disabled..Is it cacti's poller ignoring the disabled flag?

netniV commented 6 years ago

I think that comes down to the fact they may be running via cron. Check whether you have any extra cron files set to run them separate to to the main poller. I have suggested previously, there should be a way to disable all code of a plugin from running but until we have classes that plugins must inherit, that's not likely to happen.

Most code is require to include the auth.php and if my sleepy brain remembers rightly, that's where the realms / permissions / plugins are checked. A poller file tends to include less than the website code does so realm checking doesn't happen since it's not being run directly by a user and I think bypasses those checks. Also, we need to be able to run some bits of a plugin before it's installed to be able to install it (chicken and the egg).

jpobeda commented 6 years ago

Only entry in cron is cacti's poller. I do manual upgrades, normally don't use CentOS packages (on my case).

I'm not sure what part of syslog is triggering those queries but I guess I should open an issue on syslog repository. Maybe after you disable the plugin, cacti keeps calling syslog and processing removal rules, etc, etc.

Feel free to close this case, definitely @netniV has some good points to have better control over plugins.

So @netniV or @cigamit , if I understanding this right, after you disable the plugins we expect to see "sql queries" from them. Just to validate this with you guys so I know in the future that if want to take a certain plugin out of the picture (for troubleshooting) the best way is to actual uninstall it.

cigamit commented 6 years ago

That is correct. If you disable the plugin, then it will not run the data collector. Otherwise you would see syslog messages in the cacti log. Same applies for MacTrack. The MacTrack issues look like GUI interactions. The syslog look more like poller interactions. Best thing to do is check the cacti log.

jpobeda commented 6 years ago

Hey, remember that if you want you can close this issue.

I just wanted to add something, it looks like cacti keeps trying to run disabled plugins and that might be something to fix.

I've had an issue with syslog plugin recently and even though Cacti has disabled it you can still see lots of attempts to run syslog queries which I think it's because cacti keeps calling syslog process, not sure how that works.

You can read my post here

netniV commented 6 years ago

The logs do suggest it's down to the open files. Maybe see if there is a query that shows what files are open within MySQL ?

cigamit commented 6 years ago

I think this needs to be opened in the MacTrack section, The dot1x queries are not done right and are run from the user interface. Please open the ticket there. Closing here.