Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.65k stars 406 forks source link

Boost performance may become poor even in single server mode #2929

Closed j66h closed 5 years ago

j66h commented 5 years ago

Environment OS: Windows Server 2016 standard Cacti: 1.2.6 Spine: 1.2.6 Cygwin: 3.0.7 Net-SNMP: 5.7.3 PHP: 7.3.9 MySQL: 8.0.17

Single server. Just "Main Poller".

Describe the bug After update from 1.2.5 to 1.2.6 boost is slower 20 times. From 1.2.5: 02.09.2019 22:46:16 - SYSTEM BOOST STATS: Time:51.3300 RRDUpdates:195709 From 1.2.6: 06.09.2019 08:49:42 - SYSTEM BOOST STATS: Time:1108.3900 RRDUpdates:180278

Is there anything I can do to further analyze this?

Additional information Boost status and settings documented in attached files. (as text and as png) Boost Settings.txt Boost Status.txt

Boost Status Boost Settings

cigamit commented 5 years ago

Could this be due to the overloading of your Windows system. The performance should have improved significantly based upon the changes. I would say that based upon your issues around spine, it's time for a platform change.

cigamit commented 5 years ago

Also, that number of updates should take much less time. So, you should also review your slow query log file and look for some blocking queries. I suspect this has everything to do with you overheating the system.

j66h commented 5 years ago

I do not see a relation to system overload. System is almost idle, if poller is not running. 3 of 4 GByte physical memory used. Enterprise storage system. To investigate, I copied poller_boost.php from 1.2.5 back to the system today morning. Down to 60 seconds again. Without changing anything else...

Slow query log is empty. long_query_time=10. Is it advisable to lower long_query_time?

Regarding spine: I got mysql connection timeouts 5 to 10 times a day. Thus, I changed back to cmd.php over the weekend. Erros gone away. So I'll stick to cmd.php as long as I can. I'll request some more CPUs and RAM.

cacti is running here on windows since 2006. I did a new install a month ago (OS, DB and application refrsh). Since everything was ok till now, we again decided pro windows cause we have a lot more server admins for this OS...

j66h commented 5 years ago

doubled CPUs to 8.

added 2GB RAM (now 6GB), set innodb_buffer_pool to 3 GB (1 GB before)

set long_query_time=1

poked arround:

any ideas, what I could do to further analyze? Tried to figure things out in code. But sadly I'm not a programmer and have very limited skills in this :-(

Does this help something: Detailed Runtime Timers: | RRDUpdates:198477 TotalTime:1253 get_records:0.62 results_cycle:1248.56 rrd_filename_and_template:8.13 rrd_lastupdate:606.48 rrdupdate:633.21 delete:3.39

Almost whole time is used in results_cycle. Are rrd_lastupdate and rrdupdate contained in results_cycle?

netniV commented 5 years ago

Have you monitored your disk activity? That's a lot of RRDs to process, and without doing a complete re-read of the above I'm not sure what disks you are using.

j66h commented 5 years ago

Server is running in VMware environment wih enterprise storage connected by fibrechannel. According to SAN admin the system is far away from being at its limits.

Another thing: Number of RRDs to update is the same, regardless if I use poller_boost.php from 1.2.5 or 1.2.6, right? If storage is at its limits, then 1.2.5 should at least not be 20 times faster. I was thinking about storage latency, cause rrds are a lot of rather smaller files. But again, this would cause problems on 1.2.5 too...

Here detailed run timers from today, running 1.2.5 again: RRDUpdates:182127 TotalTime:43 get_records:0.38 results_cycle:39.78 rrd_filename_and_template:3.42 rrd_lastupdate:3 rrdupdate:32.94 delete:2.06

Not sure, if this has a meening: Besides beeing faster in all steps, with 1.2.5 rrd_last update is only a tenth of rrd_update. With 1.2.6 they are at the same level.

netniV commented 5 years ago

That sounds like a fair analysis, I'd have to look at what changed to peer review the differences but I will be the first to admit that I don't often delve into that part of the code ;-)

stormonts commented 5 years ago

I am running into the exact same problem after an upgrade from 1.2.4 to 1.2.6. Our server has 40 cores @ 2.2 GHz with 64 GB of RAM so plenty of power. Nothing at all has changed on the system regarding data sources added, settings changes, etc. The only difference was the upgrade of Cacti and Spine from 1.2.4 to 1.2.6. This is causing graph display to slow to a crawl if the graphs are even shown at all.

image

cigamit commented 5 years ago

Yea, I can see why. The fact that it took so long 3459 minutes is unusual. Also, can you please look at the process list in mysql using the following command:

show full processlist;

while boost is running and post the results here.

cigamit commented 5 years ago

Also, turn on the slow query log and set the threshold to 1 second. Upload the results.

cigamit commented 5 years ago

@stormonts, looking at you stats, you have a real problem. Do you see how many boost records there are. That's the real problem. One more thing to send is the performance table image. The reason that the graphs are taking so long to generate is due to the size of that archive table. It's bad. Take your update interval higher, at least for a moment.

What is your poller interval? You may have to do something special to clear out the backlog.

cigamit commented 5 years ago

I've got 1.2.6 boost, and just did an update with 50k Data Sources per pass and I got these results.

2019/09/18 19:04:34 - SYSTEM BOOST STATS: Time:22.4200 RRDUpdates:126026
2019/09/18 19:04:34 - SYSTEM BOOST DETAIL STATS: RRDUpdates:126026, TotalTime:22, get_records:0.27, results_cycle:21.52, rrd_filename_and_template:0.56, rrd_lastupdate:6.99, rrdupdate:10.93, delete:0.34
2019/09/18 19:04:51 - SYSTEM DSSTATS STATS: Type:DAILY, Time:39.3977 RRDUser:0.0000 RRDSystem:0.0000 RRDReal:0.0000
cigamit commented 5 years ago

Upload the following too:

show create table poller_output_boost;
stormonts commented 5 years ago

Yeah, it ran right before we ran the upgrade to 1.2.6 and then hasn't since with the table growing. Our system was fast before the upgrade and something happened.

mysql> show full processlist; +---------+-----------+-----------+-------+---------+------+--------------+----------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+-----------+-----------+-------+---------+------+--------------+----------------------------------------------------------+ | 7471 | cactiuser | localhost | cacti | Query | 17 | Sending data | SELECT count(*) FROM poller_output_boost_arch_1568838028 | | 7474 | cactiuser | localhost | cacti | Sleep | 93 | | NULL | | 7475 | cactiuser | localhost | cacti | Sleep | 305 | | NULL | | 7476 | cactiuser | localhost | cacti | Sleep | 27 | | NULL | | 7477 | cactiuser | localhost | cacti | Sleep | 102 | | NULL | | 7478 | cactiuser | localhost | cacti | Sleep | 281 | | NULL | | 7479 | cactiuser | localhost | cacti | Sleep | 178 | | NULL | | 7480 | cactiuser | localhost | cacti | Sleep | 190 | | NULL | | 29820 | cactiuser | localhost | cacti | Sleep | 243 | | NULL | | 29873 | cactiuser | localhost | cacti | Sleep | 150 | | NULL | | 29874 | cactiuser | localhost | cacti | Sleep | 181 | | NULL | | 29882 | cactiuser | localhost | cacti | Sleep | 88 | | NULL | | 29883 | cactiuser | localhost | cacti | Sleep | 341 | | NULL | | 37350 | cactiuser | localhost | cacti | Sleep | 250 | | NULL | | 37351 | cactiuser | localhost | cacti | Sleep | 179 | | NULL | | 37352 | cactiuser | localhost | cacti | Sleep | 274 | | NULL | | 37353 | cactiuser | localhost | cacti | Sleep | 311 | | NULL | | 37370 | cactiuser | localhost | cacti | Sleep | 220 | | NULL | | 37412 | cactiuser | localhost | cacti | Sleep | 148 | | NULL | | 37456 | cactiuser | localhost | cacti | Sleep | 212 | | NULL | | 37488 | cactiuser | localhost | cacti | Sleep | 119 | | NULL | | 37728 | cactiuser | localhost | cacti | Sleep | 118 | | NULL | | 38536 | cactiuser | localhost | cacti | Sleep | 336 | | NULL | | 39472 | cactiuser | localhost | cacti | Sleep | 58 | | NULL | | 40094 | cactiuser | localhost | cacti | Sleep | 93 | | NULL | | 40358 | cactiuser | localhost | cacti | Sleep | 367 | | NULL | | 1214420 | cactiuser | localhost | cacti | Sleep | 717 | | NULL | | 1221884 | cactiuser | localhost | cacti | Sleep | 657 | | NULL | | 1229351 | cactiuser | localhost | cacti | Sleep | 596 | | NULL | | 1236819 | cactiuser | localhost | cacti | Sleep | 545 | | NULL | | 1244282 | cactiuser | localhost | cacti | Sleep | 473 | | NULL | | 1251750 | cactiuser | localhost | cacti | Sleep | 415 | | NULL | | 1259216 | cactiuser | localhost | cacti | Sleep | 356 | | NULL | | 1266681 | cactiuser | localhost | cacti | Sleep | 297 | | NULL | | 1274149 | cactiuser | localhost | cacti | Sleep | 236 | | NULL | | 1281614 | cactiuser | localhost | cacti | Sleep | 174 | | NULL | | 1289080 | cactiuser | localhost | cacti | Sleep | 116 | | NULL | | 1296544 | cactiuser | localhost | cacti | Sleep | 56 | | NULL | | 1303340 | cactiuser | localhost | cacti | Query | 0 | starting | show full processlist | | 1304015 | cactiuser | localhost | cacti | Sleep | 4 | | NULL | +---------+-----------+-----------+-------+---------+------+--------------+----------------------------------------------------------+ 40 rows in set (0.00 sec)

stormonts commented 5 years ago

mysql> show create table poller_output_boost; +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | poller_output_boost | CREATE TABLE poller_output_boost ( local_data_id mediumint(8) unsigned NOT NULL DEFAULT '0', rrd_name varchar(19) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '', time timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', output varchar(512) COLLATE utf8mb4_unicode_ci NOT NULL, PRIMARY KEY (local_data_id,time,rrd_name) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci | +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

stormonts commented 5 years ago

How do I get you the performance table image and turn on the slow query?

cigamit commented 5 years ago

Run the following:

ALTER TABLE poller_output_boost ROW_FORMAT=Dynamic;

It looks good otherwise.

cigamit commented 5 years ago

What version of MySQL/MariaDB too?

stormonts commented 5 years ago

MySQL 5.7.27

I ran the ALTER TABLE command and got this:

mysql> ALTER TABLE poller_output_boost ROW_FORMAT=Dynamic;

Query OK, 0 rows affected (8.24 sec) Records: 0 Duplicates: 0 Warnings: 0

cigamit commented 5 years ago

Here is how you set the slow query log:

set slow_query_log_file = 'somefile.log';
set global slow_launch_time = 1;
set global slow_query_log = ON;

After this is done, you can just turn the 'slow_query_log = OFF'. No restart should be necessary. Get me a screen shot of your Console > Configuration > Settings > Performance page.

cigamit commented 5 years ago

Without a path on that file, it should be in /var/log/mysql directory.

cigamit commented 5 years ago

Also, send me a screen shot of Console > Utilities > System Utilities > Show Boost Status. Then, run this query:

SHOW TABLES LIKE 'poller_output_boost%';
SHOW TABLE STATUS LIKE 'poller_output_boost%';
stormonts commented 5 years ago

The frequency had been 30 minutes until we just upped it to later at your suggestion: image

stormonts commented 5 years ago

image

stormonts commented 5 years ago

`mysql> SHOW TABLES LIKE 'poller_output_boost%'; +----------------------------------------+ | Tables_in_cacti (poller_output_boost%) | +----------------------------------------+ | poller_output_boost | | poller_output_boost_arch_1568600719 | | poller_output_boost_arch_1568638819 | | poller_output_boost_arch_1568655023 | | poller_output_boost_arch_1568824825 | | poller_output_boost_arch_1568838028 | | poller_output_boost_processes | +----------------------------------------+ 7 rows in set (0.00 sec)

mysql> SHOW TABLE STATUS LIKE 'poller_output_boost%'; +-------------------------------------+--------+---------+------------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+--------------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +-------------------------------------+--------+---------+------------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+--------------------+---------+ | poller_output_boost | InnoDB | 10 | Dynamic | 3239727 | 56 | 184336384 | 0 | 0 | 6291456 | NULL | 2019-09-18 19:18:38 | 2019-09-18 19:25:24 | NULL | utf8mb4_unicode_ci | NULL | row_format=DYNAMIC | | | poller_output_boost_arch_1568600719 | InnoDB | 10 | Dynamic | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2019-09-15 22:25:19 | NULL | NULL | utf8mb4_unicode_ci | NULL | | | | poller_output_boost_arch_1568638819 | InnoDB | 10 | Dynamic | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2019-09-16 09:00:19 | NULL | NULL | utf8mb4_unicode_ci | NULL | | | | poller_output_boost_arch_1568655023 | InnoDB | 10 | Dynamic | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2019-09-16 13:30:23 | NULL | NULL | utf8mb4_unicode_ci | NULL | | | | poller_output_boost_arch_1568824825 | InnoDB | 10 | Dynamic | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2019-09-18 12:40:25 | 2019-09-18 16:43:52 | NULL | utf8mb4_unicode_ci | NULL | | | | poller_output_boost_arch_1568838028 | InnoDB | 10 | Dynamic | 117106361 | 58 | 6832488448 | 0 | 0 | 105906176 | NULL | 2019-09-18 16:20:28 | 2019-09-18 19:25:25 | NULL | utf8mb4_unicode_ci | NULL | | | | poller_output_boost_processes | InnoDB | 10 | Dynamic | 0 | 0 | 16384 | 0 | 0 | 0 | 1 | 2019-09-06 08:03:04 | NULL | NULL | utf8mb4_unicode_ci | NULL | | | +-------------------------------------+--------+---------+------------+-----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+--------------------+---------+ 7 rows in set (0.00 sec)

`

cigamit commented 5 years ago

Hmm, you know, "Maximum Argument Length" to 'unlimited' is likely getting you into trouble. I have mine set at 2000. You might want to tune it down. But you will have to kill and manually start boost again, at which time there will be a long transfer of data. So, the first few minutes it'll do nothing, and you will see a long query with archive tables being joined. Then, after that, it should take off.

cigamit commented 5 years ago

Those tables that have zero rows, drop them before you restart the poller boost using this.

php -q poller_boost.php --force --debug --verbose
stormonts commented 5 years ago

We are running a Linux Kernel newer than 2.6.3 which is why we went with the unlimited (based on the comment on that screen). I'll drop it to 2000 and see what happens.

Sadly, i am still failing on the other slow query work.

" Incorrect argument type to variable 'slow_query_log_file'"

cigamit commented 5 years ago

Make sure you do that as a user who can write RRDfiles.

cigamit commented 5 years ago

What is the 'slow_query_log_file' set to? If it's set, just use the name it's set to.

cigamit commented 5 years ago

Sorry, I had a typo on the setting of the file. The file name needs to be quoted.

cigamit commented 5 years ago

Updates?

stormonts commented 5 years ago

php -q poller_boost.php --force --debug --verbose is still running

cigamit commented 5 years ago

What about the slow query log? Are you seeing it grow?

stormonts commented 5 years ago
            I unfortunately didn’t get that set yet.

From: Jimmy Conner [mailto:notifications@github.com] Sent: Wednesday, September 18, 2019 7:41 PM To: Cacti/cacti cacti@noreply.github.com Cc: Stormont, Stephen (IMS) StormontS@imsweb.com; Mention mention@noreply.github.com Subject: Re: [Cacti/cacti] boost performance slowed down 20 times after update to 1.2.6 (#2929)

What about the slow query log? Are you seeing it grow?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Cacti/cacti/issues/2929?email_source=notifications&email_token=AGFAYQ4TDIYLEAWLYFXZH73QKK37NA5CNFSM4IUFZ5E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7BYJWI#issuecomment-532907225, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGFAYQ2G7HMXN562KQ7FPYDQKK37NANCNFSM4IUFZ5EQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

cigamit commented 5 years ago

You can also run the following:

watch "mysql -e 'show processlist' | egrep -v '(Sleep|NULL)' "
cigamit commented 5 years ago

Set it now. You simply have to quote the file name, or just don't change the name.

cigamit commented 5 years ago

There are some updates in the 1.2.x branch. Not sure if they have any impact. But that did change too. My system runs real fast. You problem is your large archive table has over 100M rows in it.

stormonts commented 5 years ago

I got dragged away and won't be back at this for a few hours.


From: Jimmy Conner notifications@github.com Sent: Wednesday, September 18, 2019 7:44 PM To: Cacti/cacti Cc: Stormont, Stephen (IMS); Mention Subject: Re: [Cacti/cacti] boost performance slowed down 20 times after update to 1.2.6 (#2929)

There are some updates in the 1.2.x branch. Not sure if they have any impact. But that did change too. My system runs real fast. You problem is your large archive table has over 100M rows in it.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Cacti/cacti/issues/2929?email_source=notifications&email_token=AGFAYQ45V4SDFWJHUCXUBXLQKK4NBA5CNFSM4IUFZ5E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7BYOYY#issuecomment-532907875, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGFAYQ5YDE7NVNC4IIMLDETQKK4NBANCNFSM4IUFZ5EQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

stormonts commented 5 years ago

I'm failing at this terribly. "watch "mysql -e 'show processlist' | egrep -v '(Sleep|NULL)' "" causes the system to freeze.

And I still can't get the slow log file working due to various errors:

mysql> set slow_query_log_file = 'slow.log'; ERROR 1229 (HY000): Variable 'slow_query_log_file' is a GLOBAL variable and shou ld be set with SET GLOBAL mysql> set global slow_launch_time = 1; ERROR 1227 (42000): Access denied; you need (at least one of) the SUPER privileg e(s) for this operation

I guess we are just going to have to start all over since the pending, archived, and total records keep increasing no matter what I seem to do and the next start time of Boost is still 2019-09-13 13:15:23, even though that is five days ago.

bmfmancini commented 5 years ago

Hey try this

SET GLOBAL slow_query_log_file = '/path/filename';

On Wed, Sep 18, 2019 at 9:52 PM stormonts notifications@github.com wrote:

I'm failing at this terribly. "watch "mysql -e 'show processlist' | egrep -v '(Sleep|NULL)' "" causes the system to freeze.

And I still can't get the slow log file working due to various errors:

mysql> set slow_query_log_file = 'slow.log'; ERROR 1229 (HY000): Variable 'slow_query_log_file' is a GLOBAL variable and shou ld be set with SET GLOBAL mysql> set global slow_launch_time = 1; ERROR 1227 (42000): Access denied; you need (at least one of) the SUPER privileg e(s) for this operation

I guess we are just going to have to start all over since the pending, archived, and total records keep increasing no matter what I seem to do and the next start time of Boost is still 2019-09-13 13:15:23, even though that is five days ago.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/2929?email_source=notifications&email_token=ADGEXTEADOCXQOSOZO5H2ZDQKLLMTA5CNFSM4IUFZ5E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7B6HBI#issuecomment-532931461, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGEXTG5DWAZFNL7LPPCVMDQKLLMTANCNFSM4IUFZ5EQ .

-- Thank you

Sean Mancini Owner/Principal Engineer www.seanmancini.com

“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”

– Kevin Mitnick

stormonts commented 5 years ago

Thanks, that worked. There is absolutely nothing in the log.

Log content is below:

/usr/sbin/mysqld, Version: 5.7.27 (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock Time Id Command Argument /usr/sbin/mysqld, Version: 5.7.27 (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock Time Id Command Argument

Time: 2019-09-19T02:02:45.501844Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 1207.737288 Lock_time: 0.000112 Rows_sent: 0 Rows_examined: 105863650

use cacti; SET timestamp=1568858565; INSERT INTO poller_output_boost_arch_1568857286 SELECT * FROM poller_output_boost_arch_1568838028;

Time: 2019-09-19T02:03:20.653270Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 33.511457 Lock_time: 0.000108 Rows_sent: 1 Rows_examined: 111465862

SET timestamp=1568858600; SELECT COUNT(local_data_id) FROM poller_output_boost_arch_1568857286;

Time: 2019-09-19T02:04:04.945731Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 44.292198 Lock_time: 0.000087 Rows_sent: 1 Rows_examined: 111465862

SET timestamp=1568858644; SELECT COUNT(DISTINCT local_data_id) FROM poller_output_boost_arch_1568857286;

Time: 2019-09-19T02:04:31.454097Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 26.504241 Lock_time: 0.000025 Rows_sent: 1 Rows_examined: 111465862

SET timestamp=1568858671; SELECT count(*) FROM poller_output_boost_arch_1568857286;

Time: 2019-09-19T02:05:06.065752Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 31.495469 Lock_time: 0.000033 Rows_sent: 1 Rows_examined: 111459858

SET timestamp=1568858706; SELECT count(*) FROM poller_output_boost_arch_1568857286;

Time: 2019-09-19T02:05:33.895797Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 25.130912 Lock_time: 0.000033 Rows_sent: 1 Rows_examined: 111457646

SET timestamp=1568858733; SELECT count(*) FROM poller_output_boost_arch_1568857286;

Time: 2019-09-19T02:06:00.625863Z

User@Host: cactiuser[cactiuser] @ localhost [] Id: 7469

Query_time: 25.025236 Lock_time: 0.000030 Rows_sent: 1 Rows_examined: 111454486

SET timestamp=1568858760;

stormonts commented 5 years ago

I'm starting to develop a theory...during the upgrade to 1.2.6, the installer recommended enabling innodb_doublewrite which we did. I'm beginning to wonder if things gradually started backing up until 9/16 when we noticed the slow behavior and reverted the setting (but by that point it was too late to recover). Is there a way to just delete all of the entries that haven't yet been processed? I know we will have holes in graphs, but that is fine.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

cigamit commented 5 years ago

Okay, so download lib/boost.php and poller_boost.php from the 1.2.x branch, copy in place, kill boost one more time, clear you slow query log, and then force it again. Performance should be better now.

Also, how much server memory do you have? What is the size of your innodb_buffer_pool and what is your tmp_table_size set to? The innodb_buffer_pool should be about 60% of your system memory and tmp_table_size should be large enough to handle ~50k data sources (per your setting). Because ROW_FORMAT=Dynamic was not set on the archive table, it's likely using quite a bit more. You can confirm this by running the following:

SELECT TABLE_NAME, AVG_ROW_LENGTH FROM information_schema.TABLES WHERE TABLE_NAME LIKE 'poller_output_boost%';

Please confirm.

If nothing else works, just drop the poller_output_boost_arch% tables. But that one entry in the slow query log should disappear, and speed should increase. 50k data sources should be enough.

stormonts commented 5 years ago

Server has 62 GB of RAM.

innodb_buffer_pool_size is set to 40960M

tmp_table_size is st to 2269M

image

cigamit commented 5 years ago

Remove all those empty archive tables. It's just a sign of the bleeding. Let me know about any slow queries after you've made that update and restarted.

stormonts commented 5 years ago

I dropped those tables, downloaded the updated poller_boost.php file, restarted boost, and am not seeing anything in the slow query log, but the processing doesn't seem to be much faster than it was before. If I just drop the "poller_output_boost_arch_1568902145" table, we'll lose those 28 million records but then we will be starting from scratch and can better watch if the processing keeps creeping up and whether the "next run time" ever updates from 9-13-19, correct?

image

stormonts commented 5 years ago

Another thing that seems strange is that when we look at top, mysql only seems to be using up to 200% of the CPU and I don't understand why it isn't using more cores.

stormonts commented 5 years ago

After dropping all of those rows and installing the new poller_boost.php file, things are again flying. Not sure what got things so caught up, but thanks for all of the help.

eschoeller commented 5 years ago

Hello! Glad I found this. I think I'm having the same problem. boost was running at 30-40 seconds prior to the 1.2.4 -> 1.2.6 upgrade. Now I'm running at 350-450 seconds.