centreon / centreon-archived

Centreon is a network, system and application monitoring tool. Centreon is the only AIOps Platform Providing Holistic Visibility to Complex IT Workflows from Cloud to Edge.
https://www.centreon.com
GNU General Public License v2.0
574 stars 241 forks source link

All Pollers down and All service status to Unknown periodically after backup #10538

Open tlierdotfr opened 2 years ago

tlierdotfr commented 2 years ago

BUG REPORT INFORMATION

Prerequisites

Versions

For the RPM based systems

$ rpm -qa | grep centreon | egrep -v "(plugin|pack)" | sort
centreon-21.10.1-1.el7.centos.noarch
centreon-auto-discovery-server-21.10.1-1.el7.centos.noarch
centreon-base-config-centreon-engine-21.10.1-1.el7.centos.noarch
centreon-broker-21.10.0-6.el7.x86_64
centreon-broker-cbd-21.10.0-6.el7.x86_64
centreon-broker-cbmod-21.10.0-6.el7.x86_64
centreon-broker-core-21.10.0-6.el7.x86_64
centreon-broker-storage-21.10.0-6.el7.x86_64
centreon-clib-21.10.0-6.el7.x86_64
centreon-common-21.10.1-1.el7.centos.noarch
centreon-connector-21.10.0-6.el7.x86_64
centreon-connector-perl-21.10.0-6.el7.x86_64
centreon-connector-ssh-21.10.0-6.el7.x86_64
centreon-database-21.10.1-1.el7.centos.noarch
centreon-engine-21.10.0-6.el7.x86_64
centreon-engine-daemon-21.10.0-6.el7.x86_64
centreon-engine-extcommands-21.10.0-6.el7.x86_64
centreon-gorgone-21.10.0-3.el7.centos.noarch
centreon-gorgone-centreon-config-21.10.0-3.el7.centos.noarch
centreon-license-manager-21.10.0-1.el7.centos.noarch
centreon-license-manager-common-21.10.0-1.el7.centos.noarch
centreon-perl-libs-21.10.1-1.el7.centos.noarch
centreon-poller-centreon-engine-21.10.1-1.el7.centos.noarch
centreon-pp-manager-21.10.0-2.el7.centos.noarch
centreon-release-21.10-4.el7.centos.noarch
centreon-trap-21.10.1-1.el7.centos.noarch
centreon-web-21.10.1-1.el7.centos.noarch
centreon-widget-engine-status-21.10.0-2.el7.centos.noarch
centreon-widget-global-health-21.10.0-2.el7.centos.noarch
centreon-widget-graph-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-grid-map-21.10.0-2.el7.centos.noarch
centreon-widget-hostgroup-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-host-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-httploader-21.10.0-2.el7.centos.noarch
centreon-widget-live-top10-cpu-usage-21.10.0-2.el7.centos.noarch
centreon-widget-live-top10-memory-usage-21.10.0-2.el7.centos.noarch
centreon-widget-servicegroup-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-service-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-tactical-overview-21.10.0-2.el7.centos.noarch

Operating System

$ cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

Browser used

Additional environment details (AWS):

Description

Every few days, our Centreon platform seems to be frozen. Every pollers are RED and every Services and Host are Unknown. Although all perfdata are still OK. It's only services/host status that are not anymore updated.

It seems to occur sometimes after the Centreon backup process (at 3h30 in the morning).

Each time, we have to manually restart CBD process in order to get everything back to normal.

$ systemctl restart cbd 

Logs

centreon-engine logs => OK events are still getting on Central

$ tail -f /var/log/centreon-engine/centengine.log
[1641292136] [29118] SERVICE ALERT: IIS-FRAIS-3;ApplicationPool-VITACENTER-LIGHT-EIR;OK;SOFT;2;OK: Application pool 'VITACENTER-LIGHT-EIR' status: started [auto start: true], requests: 4.89/s
[1641292166] [29118] SERVICE ALERT: MSSQL-1;Cpu;OK;SOFT;2;OK: 10 CPU(s) average usage is 72.00 %
[1641292196] [29118] SERVICE ALERT: IIS-FRAIS-3;ApplicationPool-VITACENTER-LIGHT-EIR;OK;HARD;1;OK: Application pool 'VITACENTER-LIGHT-EIR' status: started [auto start: true], requests: 5.77/s
[1641292226] [29118] SERVICE ALERT: MSSQL-1;Cpu;OK;HARD;1;OK: 10 CPU(s) average usage is 71.40 %
[1641292256] [29118] SERVICE ALERT: Agences;fw_Paris;OK;HARD;1;OK - 31.32.32.29 rta 8.865ms lost 0%

centreon-broker logs

HERE is the issue: When we have the problem here is what the LOG looks like : It seems to occur after the backup process.

$ tail -f /var/log/centreon-broker/central-broker-master.log
[2022-01-02T03:30:02.329+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.329+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.331+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.831+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.831+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:03.332+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:03.332+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:05.356+01:00] [sql] [error] conflict_manager: error in the main loop: statement -1320387967 not prepared
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:03.005+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: could not insert data in data_bin:  MySQL server has gone away
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: could not update metrics:  MySQL server has gone away
[2022-01-03T03:30:03.505+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:03.505+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:03.525+01:00] [sql] [error] conflict_manager: error in the main loop: could not insert data in data_bin:  MySQL server has gone away
[2022-01-03T03:30:04.077+01:00] [core] [error] failover: global error: conflict_manager: events loop interrupted
[2022-01-03T03:30:04.077+01:00] [core] [info] sql stream stopped with 0 ackowledged events
[2022-01-03T03:30:04.077+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:04.077+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:04.077+01:00] [core] [error] failover: global error: conflict_manager: events loop interrupted
[2022-01-03T03:30:04.080+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:04.080+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:04.080+01:00] [core] [info] storage stream stopped with 0 acknowledged events

When we DO NOT have the problem here is what the LOG looks like :

$ tail -f /var/log/centreon-broker/central-broker-master.log
[2022-01-04T03:30:02.475+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.475+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.975+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.975+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.976+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:03.477+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:03.477+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:06.484+01:00] [sql] [error] conflict_manager: error in the main loop: statement -1320387967 not prepared

centreon gorgone logs

=> Looks OK.

tail -f /var/log/centreon-gorgone/gorgoned.log
2022-01-04 11:35:00 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:00 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:03 - INFO - [autodiscovery] -class- host discovery - sync started
2022-01-04 11:35:06 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:16 - INFO - [proxy] Pong received from '2'
2022-01-04 11:35:16 - INFO - [proxy] Pong received from '3'
2022-01-04 11:35:56 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:36:16 - INFO - [proxy] Pong received from '2'
2022-01-04 11:36:16 - INFO - [proxy] Pong received from '3'

Centreon backup logs

tail -f /var/log/centreon/centreon-backup.log
[2022-01-02 03:30:01] Start central backup processus
[2022-01-02 03:30:01] Finish central backup processus
[2022-01-02 03:30:01] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-02 03:30:01] Finish monitoring engine backup processus
[2022-01-02 03:30:01] Start database backup processus
Dumping Db with LVM snapshot (full)
[2022-01-02 03:34:39] Finish database backup processus
Delete file: 2021-12-23-centreon-engine.tar.gz
Delete file: 2021-12-23-central.tar.gz
Delete file: 2021-12-23-mysql-partial.tar.gz
[2022-01-03 03:30:01] Start central backup processus
[2022-01-03 03:30:01] Finish central backup processus
[2022-01-03 03:30:01] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-03 03:30:02] Finish monitoring engine backup processus
[2022-01-03 03:30:02] Start database backup processus
Dumping Db with LVM snapshot (partial)
[2022-01-03 03:30:42] Finish database backup processus
Delete file: 2021-12-24-centreon-engine.tar.gz
Delete file: 2021-12-24-mysql-partial.tar.gz
Delete file: 2021-12-24-central.tar.gz
[2022-01-04 03:30:01] Start central backup processus
[2022-01-04 03:30:02] Finish central backup processus
[2022-01-04 03:30:02] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-04 03:30:02] Finish monitoring engine backup processus
[2022-01-04 03:30:02] Start database backup processus
Dumping Db with LVM snapshot (partial)
[2022-01-04 03:30:47] Finish database backup processus
Delete file: 2021-12-25-centreon-engine.tar.gz
Delete file: 2021-12-25-mysql-partial.tar.gz
Delete file: 2021-12-25-central.tar.gz

Additional relevant information (e.g. frequency, ...)

It occurs once every few days (5/10 days).

lpinsivy commented 2 years ago

Hi @tlierdotfr,

What is the size of your centreon_storage database?

tlierdotfr commented 2 years ago

Hi @lpinsivy , thanks for your reply.

Our DB backup Full file is about 1.6Go. And you can find below the information about DB Statistics on Centreon Admin page : image

lpinsivy commented 2 years ago

How often do you make a full backup?

During the full backup, MySQL is stoped to make a mysqldump.

It look like Centreon Broker can't reconnect by itself and you need to restart the process. We will deliver soon a new version of Centreon Broker to improve reconnection.

Regards,

tlierdotfr commented 2 years ago

We have let the default parameters for Centreon backup, so a backup full every Sunday : image

But the issue seems also to occur during DIFF Backups. For instance, in previous logs, the issue occured on Monday 3rd of January

lpinsivy commented 2 years ago

We know this issue and we have an internal ticket MON-11846.

We will make a fix asap.

Can you give us your filesystem description using "$ df -h"

Regards,

tlierdotfr commented 2 years ago

Here it is :

Filesystem                                                      Size  Used Avail Use% Mounted on
devtmpfs                                                        1.9G     0  1.9G   0% /dev
tmpfs                                                           1.9G     0  1.9G   0% /dev/shm
tmpfs                                                           1.9G  185M  1.7G  10% /run
tmpfs                                                           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/centos_centreon--central-root                        20G  3.0G   16G  16% /
/dev/sda1                                                       969M  152M  751M  17% /boot
/dev/mapper/centos_centreon--central-var_log                    9.8G  395M  8.9G   5% /var/log
/dev/mapper/centos_centreon--central-var_lib_mysql               16G  7.3G  7.7G  49% /var/lib/mysql
/dev/mapper/centos_centreon--central-var_lib_centreon           6.8G  3.9G  2.6G  60% /var/lib/centreon
/dev/mapper/centos_centreon--central-var_cache_centreon_backup  4.8G  3.7G  879M  82% /var/cache/centreon/backup
/dev/mapper/centos_centreon--central-var_lib_centreon--broker   4.8G   20M  4.6G   1% /var/lib/centreon-broker
tmpfs                                                           379M     0  379M   0% /run/user/0