Closed ericMoulot closed 1 year ago
A solution was found in issue #5609 ; the solution consisted of setting the parameter Instance timeout (Configuration > Pollers > Broker configuration > Output > Instance timeout OR /etc/centreon-broker/central-broker.json) to it's default value. The value previously set by my team was 20 seconds, which caused a race condition between the freshness verification task, and the refresh interval for resource statuses, resulting in the flapping statuses.
Additional information for future readers:
BUG REPORT INFORMATION
Prerequisites
Versions
centreon-auto-discovery-server-21.10.3-2.el8.noarch centreon-base-config-centreon-engine-21.10.10-1.el8.noarch centreon-broker-21.10.3-3.el8.x86_64 centreon-broker-cbd-21.10.3-3.el8.x86_64 centreon-broker-cbmod-21.10.3-3.el8.x86_64 centreon-broker-core-21.10.3-3.el8.x86_64 centreon-broker-storage-21.10.3-3.el8.x86_64 centreon-clib-21.10.3-3.el8.x86_64 centreon-common-21.10.10-1.el8.noarch centreon-connector-21.10.3-3.el8.x86_64 centreon-connector-perl-21.10.3-3.el8.x86_64 centreon-connector-ssh-21.10.3-3.el8.x86_64 centreon-engine-21.10.3-3.el8.x86_64 centreon-engine-daemon-21.10.3-3.el8.x86_64 centreon-engine-extcommands-21.10.3-3.el8.x86_64 centreon-gorgone-21.10.3-1.el8.noarch centreon-gorgone-centreon-config-21.10.3-1.el8.noarch centreon-license-manager-21.10.0-1.el8.noarch centreon-license-manager-common-21.10.0-1.el8.noarch centreon-perl-libs-21.10.10-1.el8.noarch centreon-poller-centreon-engine-21.10.10-1.el8.noarch centreon-pp-manager-21.10.0-2.el8.noarch centreon-release-21.10-5.el8.noarch centreon-trap-21.10.10-1.el8.noarch centreon-web-21.10.10-1.el8.noarch centreon-widget-engine-status-21.10.0-2.el8.noarch centreon-widget-global-health-21.10.1-1.el8.noarch centreon-widget-graph-monitoring-21.10.0-2.el8.noarch centreon-widget-grid-map-21.10.0-2.el8.noarch centreon-widget-hostgroup-monitoring-21.10.0-2.el8.noarch centreon-widget-host-monitoring-21.10.0-2.el8.noarch centreon-widget-httploader-21.10.0-2.el8.noarch centreon-widget-live-top10-cpu-usage-21.10.0-2.el8.noarch centreon-widget-live-top10-memory-usage-21.10.0-2.el8.noarch centreon-widget-servicegroup-monitoring-21.10.0-2.el8.noarch centreon-widget-service-monitoring-21.10.1-1.el8.noarch centreon-widget-tactical-overview-21.10.0-2.el8.noarch
Operating System
RedHat 8
Browser used
Version: Firefox 106.0.1, Chrome 107.0.5304.63
Additional environment details (AWS, VirtualBox, physical, etc.): Virtual Machine ESXi
Description
I did a fresh install of Centreon 21.10.8, with a Central and Database server.
Soon after adding hosts and services to be monitored, I noticed that the status of hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. The same happens in the monitoring views: the status of monitored hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. This happens at random every few minutes.
However the real status of the servers is unchanged, and checks on the command line from Central poller are OK.
What I have tried:
I also noticed their is a javascript called vendor.2d6b7428.js that makes a large number of status requests (once every 2s) to the API right after the first status requests initiated by the Web Page itself. Found it on the server at location
/usr/share/centreon/www/static/vendor.2d6b7428.js
and in the header of the Centreon web page in a statement:The flapping behavior persists.
Steps to Reproduce
Describe the received result
Hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. The same happens in the monitoring views: the status of monitored hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. This happens at random every few minutes.
Describe the expected result
Hosts and service status should not flap from unknown to the actual status.
Logs
PHP error logs
For version using PHP 7.2 or 7.3 on centOs 8 or PHP 8
centreon-engine logs (if needed)
centreon-broker logs (if needed)
centreon gorgone logs for Centreon >= 20.4 (if needed)
Additional relevant information (e.g. frequency, ...)
The status of hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. The same happens in the monitoring views: the status of monitored hosts and services go to UNKNOWN for a few seconds (about 5s to 10s) before coming back to normal. This happens at random every few minutes.