Icinga / icingadb-web

Icinga DB Web – UI for Icinga DB – Provides a graphical interface to your Icinga monitoring
GNU General Public License v2.0
67 stars 22 forks source link

High load on mysqld after updating to 1.1.1 #947

Closed robertkrenn closed 10 months ago

robertkrenn commented 11 months ago

Describe the bug

We updated to Icingadb and Icingadb-Web 1.1.1. After that the mysqld is consuming around 1700% of cpu. When adding more vCPUs, mysqld takes it all. The overall load is above 2.0 per cpu. It completely drops when disabling icingadb-web, while icingadb itself is still syncing in the background. First we thougt this is some kind of conversion in the database, but after now 1 week it is still the same.

Expected behavior

no increase of load after updateing to icingadb 1.1.1

Your Environment

Include as many relevant details about the environment you experienced the problem in.

nilmerg commented 11 months ago

Hi, we have no similar reports yet and I am also not sure what could have caused this.

Please enable the slow query log of MySQL and tell us whether and what's reported regarding icingadb.

robertkrenn commented 11 months ago

the slow query log doesn't report anything.

nilmerg commented 11 months ago

I probably don't need to tell you, that the slow query log needs to be enabled first?

If it's really empty, that's a good thing, I guess. But also a bad thing, as it get's more difficult to track the problem down now. :sweat_smile:

Firstly, have you downgraded Icinga DB Web already to v1.0.x? Does the load go down then? If it doesn't, try downgrading icinga-php-library to v0.12.0. (This may require further downgrades of e.g. modules)

Then, in every case, please take a look at SHOW FULL PROCESSLIST while the load is up. Check whether there's a repeating query or something else suspicious.

robertkrenn commented 11 months ago

Downgrad gets a real workprocess, because the main update we also did, was updating to php 8.1. While downgrading icingadb-web and icinga-php without also downgrading to php 7.x, the icingaweb2 interfaces crashes completely.

With icingadb 1.1.1 and everything in Place this Processlist is causing the load. Sometimes there are only 2 or 5 parallel queries (then the load goes also down for a few seconds), but the most of the time there are 30 to 100 parallel process, all initiated by user icingadb on the icingadb database grafik

In every queriy it processes really all configured hosts. is this normal? if it is really neccessary I would try to go back to php7 to be able to get the older icingadb versions running.

nilmerg commented 10 months ago

What do you mean with all configured hosts? The queries should all have a limit.

How much users access this Web instance? This might just be a normal behavior. We've enhanced our queries to make them more efficient. This increased efficiency might now result in queries being run more frequently. With many users who have autorefresh enabled, this doesn't seem implausible to me. The empty slow query log also confirms this somewhat.

robertkrenn commented 10 months ago

Well, we have round about 500 hosts configured, and in each query all 500 hosts are namely listed, which leads to massive queries. We have about 100 Users who accress Icinga via webfrontend or via nagstamon. I've seen some installations with autorefresh intervall of Nagstamon on 10s. Before 1.1.1 this didn't cause any impact

nilmerg commented 10 months ago

This sounds like a filter. Such a large one can only be the result of user input. For example, a user who sets the limit of a host list to 500 and selects all of them to perform an action such as check now.

Before 1.1.1 this didn't cause any impact

As I said, improved efficiency. Less time spent, more time for more work.

robertkrenn commented 10 months ago

ok, we have 3 users who are in about 150 roles with different filtersets. In combination with the business-process module we have a dedicated role so the the single user only sees the objects he is responsible for. The teamleader of these users is in all these business process roles. I think this sounds like the source of the big queries, and if those useres have also autorefresh on 10s....

nilmerg commented 10 months ago

Wow. You really need to re-structure your roles. You have so much at your disposal, host-/service-groups and custom variables. Instead of listing 500 hosts in a filter, give all 500 hosts the same customvar and your filter is just a few chars.

But thanks for the update. :)