Closed Thomas-Gelf closed 2 years ago
Yet another reason to wait for IcingaDB (which will make all the things great again). 😉
while waiting for icingadb - please add these 2 indexes for mysql to resolve this issue.
create index idx_notifications_cleanup on icinga_notifications (instance_id, start_time);
create index idx_contactnotifications_cleanup on icinga_contactnotifications (instance_id, start_time);
"educated" guess for root cause - skip index scan of with unsorted start_time
table icinga_contactnotifications index instance_id table icinga_notifications index notification_idx
@lippserd: the index proposed by @abaumgaertner fixes this issue, and his assumption is correct. icinga_statehistory
carries the exact same index for exactly this reason, it's missing on those other tables. Icinga runs this query again and again, probably once a minute or so. This is fine as long as this completes in zero time, but terrible when every query execution takes minutes. I'd really love to see this merged asap, preferably for 2.12.
@Al2Klimov: that's fantastic to hear, I'm looking forward to those great things. Nonetheless, it will take many years unless everybody and his dog migrated to IcingaDB. There is no reason to not fix IDO issues in the meantime. Especially not when we're talking about simple quick wins for obvious error with an impact similar to this one.
Thanks, Thomas
IMAO it is yet another reason – to speed up the users' migration.
I've talked to @lippserd yesterday and we agreed to approve the index provided by @abaumgaertner. Meaning to say, you are safe to include it in your schema now, the obligatory schema update file will be delivered in 2.13.
For PostgreSQL, we should also investigate on whether the index is required or not.
Cheers, Michael
@htriem since you haven't done this before, here's some notes for implementing this:
Repeat the same for Postgresql with the same schema version, the index creation may slightly differ.
Test these things with
I'll assist you with that in the coming weeks when necessary.
Both indices should be (start_time, instance_id)
.
there is another missing index on the downtime table The where query is on the instance_id and entry_time value but the index is only on another columns :
KEY `sla_idx_dohist` (`object_id`,`actual_start_time`,`actual_end_time`),
KEY `idx_downtimehistory_remove` (`object_id`,`entry_time`,`scheduled_start_time`,`scheduled_end_time`),
Another problem is on the cleanup option. When i'm trying to delete a lot of days (About 1 year ) on the icinga_notifications table with a a lot of data , about 30M rows The query doesn't end correctly and it will be restarting each time. After a manualy cleanup the query work fine. I think it's can be great to add limits off number of rows deleting each time or to limite the timeperiod on each query for cleanup task
@htriem You're not working on this, right?
No.
At the moment we’re priorizing Icinga DB. So: PRs welcome.
* I'm looking at you, @abaumgaertner :)
Icinga DB was released, so this is unlikely to receive any more attention.
Since this issue was left open for a long time, we have implemented a local data maintainace solution outside of icinga2 for high troughput systems with large mysql icinga schema (> 100GB). Contact me if you are interested.
idx_statehistory_cleanup
exists, but is missing on other tables likeicinga_notifications
andicinga_contactnot*
, maybe others. In a real-life environment raw query time drops from 10 seconds to 0 with 500k involved rows with such an index. Interestingly, when triggered by Icinga itself the very same query has been running for 10+ minutes(!). We haven't been able to figure out why, but anyways - the index is absolutely necessary.