Closed eschoeller closed 5 years ago
Run an explain on that same query:
explain SELECT did.* FROM data_input_data AS did INNER JOIN data_template_data AS dtd ON did.data_template_data_id=dtd.id INNER JOIN data_local AS dl ON dl.id=dtd.local_data_id INNER JOIN host AS h ON h.id=dl.host_id WHERE h.id = '728';
+----+-------------+-------+--------+-------------------------------+---------+---------+---------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+---------+---------+---------------------------------+--------+-------------+
| 1 | SIMPLE | h | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | did | ALL | NULL | NULL | NULL | NULL | 446616 | NULL |
| 1 | SIMPLE | dtd | eq_ref | PRIMARY,local_data_id | PRIMARY | 3 | cacti.did.data_template_data_id | 1 | NULL |
| 1 | SIMPLE | dl | eq_ref | PRIMARY,host_id_snmp_query_id | PRIMARY | 3 | cacti.dtd.local_data_id | 1 | Using where |
+----+-------------+-------+--------+-------------------------------+---------+---------+---------------------------------+--------+-------------+
4 rows in set (0.00 sec)
Well, that is a problem.
Try this, and then run that query again:
ALTER TABLE data_input_data ADD INDEX data_template_data_id (data_template_data_id);
mysql> ALTER TABLE data_input_data ADD INDEX data_template_data_id (data_template_data_id);
Query OK, 0 rows affected (3.91 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain SELECT did.* FROM data_input_data AS did INNER JOIN data_template_data AS dtd ON did.data_template_data_id=dtd.id INNER JOIN data_local AS dl ON dl.id=dtd.local_data_id INNER JOIN host AS h ON h.id=dl.host_id WHERE h.id = '728';
+----+-------------+-------+-------+-------------------------------+-----------------------+---------+--------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------+-----------------------+---------+--------------+------+--------------------------+
| 1 | SIMPLE | h | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | dl | ref | PRIMARY,host_id_snmp_query_id | host_id_snmp_query_id | 3 | const | 25 | Using where; Using index |
| 1 | SIMPLE | dtd | ref | PRIMARY,local_data_id | local_data_id | 3 | cacti.dl.id | 1 | Using index |
| 1 | SIMPLE | did | ref | data_template_data_id | data_template_data_id | 3 | cacti.dtd.id | 4 | NULL |
+----+-------------+-------+-------+-------------------------------+-----------------------+---------+--------------+------+--------------------------+
4 rows in set (0.00 sec)
No, just run that same query without the explain. How long?
191 rows in set (0.01 sec)
Bingo! Now, try to move a Device.
Yup. Moved 47 devices, nothing in the mysql-slow.log and it completed much faster. Cool! So, as per usual, somehow I missed a database update along the way ...
Nope, I think we optimized out some indexes thinking things like just having the primary hold the value would be fast. However, that's not how it works.
Sadly, I think this broke my "hacked" version of graph_image.php that I use for some dashboards. I'm poking around at it right now.
Keep us posted. I doubt the index change had anything to do with it.
Somehow it did. The dashboard went blank right after I ran that alter command. Or maybe it was the two boost updates ... but I find that even more remote of a possibility. Beyond me why at this point. But, this version of graph_image.php was from 2013. After every upgrade I've been shocked that it continues to work. I just copied over the most current graph_image and made two small changes to restore it. Works just fine, so this is for the better in the end. One of those ticking timebombs:)
Another thing I noticed... for each device that is moved, I see this activity:
2019/09/25 20:46:20 - WEBUI NOTE: Table host Replicated to Poller With 1 Rows Updated
2019/09/25 20:46:20 - WEBUI NOTE: Table host_snmp_cache Replicated to Poller With 92 Rows Updated
2019/09/25 20:46:20 - WEBUI NOTE: Table data_local Replicated to Poller With 85 Rows Updated
2019/09/25 20:46:20 - WEBUI NOTE: Table graph_local Replicated to Poller With 77 Rows Updated
2019/09/25 20:46:20 - WEBUI NOTE: Table data_template_data Replicated to Poller With 85 Rows Updated
2019/09/25 20:46:20 - WEBUI NOTE: Table data_template_rrd Replicated to Poller With 319 Rows Updated
2019/09/25 20:46:21 - WEBUI NOTE: Table graph_templates_item Replicated to Poller With 1114 Rows Updated
2019/09/25 20:46:21 - WEBUI NOTE: Table data_input_data Replicated to Poller With 846 Rows Updated
Seems like perhaps the replication should happen just once at the end, instead of for each device?
Hmm. I'll have to check. I think it's just be replicating the impacted rows. I guess we should turn down the logging a bit.
Log a separate issue so that we don't loose it.
Not sure if there's anything that can be done for this, but it takes a very long time to move multiple devices between data collectors. On the device Management page I select numerous devices and then select the "Change Device Settings" option. I then change the Poller Association. There is a query that runs for every single device that takes 6 seconds to run:
So, moving 50 devices from one collector to another takes a very long time.