centreon / centreon-archived

Centreon is a network, system and application monitoring tool. Centreon is the only AIOps Platform Providing Holistic Visibility to Complex IT Workflows from Cloud to Edge.
https://www.centreon.com
GNU General Public License v2.0
574 stars 240 forks source link

Restore procedure problems #7773

Open kitatek opened 5 years ago

kitatek commented 5 years ago
Restore procedure lacks key information to succeed with incident recovery # BUG REPORT INFORMATION ### Prerequisites > The opened issue, must be code related. GitHub is not meant for support. Feel free to check the CONTRIBUTING section for more details. ***Versions*** For the RPM based systems -- Copy/Paste the result of the following command -- ``` $ rpm -qa | grep centreon ``` `centreon-plugin-Hardware-Ups-Standard-Rfc1628-Snmp-20171208-1.el6.noarch centreon-plugin-Applications-Databases-Mysql-20171208-1.el6.noarch centreon-widget-graph-monitoring-1.5.2-3.el6.noarch centreon-poller-centreon-engine-2.8.22-1.el6.noarch centreon-engine-1.8.1-1.el6.x86_64 centreon-connector-ssh-1.1.3-1.el6.x86_64 centreon-clib-1.4.2-1.el6.x86_64 centreon-broker-storage-3.0.14-1.el6.x86_64 centreon-connector-1.1.3-1.el6.x86_64 centreon-plugin-Applications-Protocol-Ldap-20171208-1.el6.noarch centreon-plugin-Applications-Protocol-Ftp-20171208-1.el6.noarch centreon-widget-host-monitoring-1.6.2-1.el6.noarch centreon-plugin-Applications-Monitoring-Centreon-Map4-Jmx-20171208-1.el6.noarch centreon-trap-2.8.22-1.el6.noarch centreon-2.8.22-1.el6.noarch centreon-plugin-Operatingsystems-Windows-Snmp-20171208-1.el6.noarch centreon-widget-hostgroup-monitoring-1.6.0-1.el6.noarch centreon-common-2.8.22-1.el6.noarch centreon-plugin-meta-2.8.22-1.el6.noarch centreon-plugin-Operatingsystems-Linux-Snmp-20180409-1.el6.noarch centreon-release-3.4-4.el6.noarch centreon-plugins-2.8.22-1.el6.noarch centreon-base-config-centreon-engine-2.8.22-1.el6.noarch centreon-widget-grid-map-1.0.0-8.el6.noarch centreon-broker-cbmod-3.0.14-1.el6.x86_64 centreon-plugin-Hardware-Printers-Generic-Snmp-20171208-1.el6.noarch centreon-plugin-Applications-Monitoring-Centreon-Database-20171208-1.el6.noarch centreon-widget-service-monitoring-1.6.2-1.el6.noarch centreon-plugin-Applications-Monitoring-Centreon-Poller-20171208-1.el6.noarch centreon-license-manager-1.1-5.el6.noarch centreon-engine-extcommands-1.8.1-1.el6.x86_64 centreon-broker-3.0.14-1.el6.x86_64 centreon-broker-cbd-3.0.14-1.el6.x86_64 centreon-widget-live-top10-cpu-usage-1.1.1-1.el6.noarch centreon-plugin-Applications-Monitoring-Centreon-Central-20171208-1.el6.noarch centreon-perl-libs-2.8.22-1.el6.noarch centreon-pp-manager-2.3-3.el6.noarch centreon-widget-tactical-overview-1.0.0-8.el6.noarch centreon-connector-perl-1.1.3-1.el6.x86_64 centreon-widget-servicegroup-monitoring-1.6.0-1.el6.noarch centreon-widget-live-top10-memory-usage-1.1.1-1.el6.noarch centreon-web-2.8.22-1.el6.noarch centreon-engine-daemon-1.8.1-1.el6.x86_64 centreon-broker-core-3.0.14-1.el6.x86_64 centreon-plugin-Network-Cisco-Standard-Snmp-20180409-1.el6.noarch centreon-plugin-Applications-Protocol-Http-20180409-1.el6.noarch centreon-plugin-Applications-Protocol-Dns-20171208-1.el6.noarch centreon-widget-engine-status-1.0.2-1.el6.noarch ` For the deb based systems -- Copy/Paste the result of the following command -- ``` $ dpkg -l | grep centreon ``` ***Operating System*** *CentOS, Debian ...* CentOS 6.5 ***Browser used*** - [ ] Google Chrome - [X ] Firefox - [ ] Internet Explorer IE11 - [ ] Safari Version: -- ***Additional environment details (AWS, VirtualBox, physical, etc.):*** kipitapp cloud KVM ### Description -- Describe the encountered issue -- Applying the Restore procedure leads to recovery failure. ### Steps to Reproduce Please describe precisely the steps to reproduce the encountered issue. 1. I logged in Centreon 2. I reached the Custom View 3. And so on... Use a working centreon distributed system with remote pollers. Apply the restore procedure. and open centreon leads to no pollers in the webgui configuration pollers page. ### Describe the received result Open centreon leads to no pollers in the webgui configuration pollers page. ### Describe the expected result Existing pollers are visible . ### Logs **PHP error logs** ``` tail -f /var/opt/rh/rh-php71/log/php-fpm/centreon-error.log ``not existing` **centreon-engine logs (*if needed*)** ``` tail -f /var/log/centreon-engine/centengine.log ``working all normally (checks logged no errors)` **centreon-broker logs (*if needed*)** ``` tail -f /var/log/centreon-broker/central-broker-master.log ``[1565455700] error: storage: could not fetch index list from data DB: could not execute query: Table 'centreon_storage.rt_index_data' doesn't exist QMYSQL: Unable to execute query (SELECT index_id , host_id, service_id, host_name, rrd_retention, service_description, special, locked FROM rt_index_data) [1565455758] error: SQL: could not get the list of outdated instances: could not execute query: Table 'centreon_storage.rt_instances' doesn't exist QMYSQL: Unable to execute query (SELECT instance_id FROM rt_instances WHERE outdated=TRUE) [1565455760] error: storage: could not fetch index list from data DB: could not execute query: Table 'centreon_storage.rt_index_data' doesn't exist QMYSQL: Unable to execute query (SELECT index_id , host_id, service_id, host_name, rrd_retention, service_description, special, locked FROM rt_index_data) [1565455818] error: SQL: could not get the list of outdated instances: could not execute query: Table 'centreon_storage.rt_instances' doesn't exist QMYSQL: Unable to execute query (SELECT instance_id FROM rt_instances WHERE outdated=TRUE) ` **centcore logs (*if needed*)** ``` tail -f /var/log/centreon/centcore.log ``2019-08-10 02:16:01 - MySQL error : cannot connect to database centreon: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) (caller: centreon::common::db:/usr/share/perl5/vendor_perl/centreon/common/db.pm:266) 2019-08-10 02:16:01 - Error when getting server properties 2019-08-10 02:17:01 - MySQL error : cannot connect to database centreon: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) (caller: centreon::common::db:/usr/share/perl5/vendor_perl/centreon/common/db.pm:266) 2019-08-10 02:17:01 - MySQL error : cannot connect to database centreon: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) (caller: centreon::common::db:/usr/share/perl5/vendor_perl/centreon/common/db.pm:266) ` ### Additional relevant information (e.g. frequency, ...) Well, recovering an incident is of upmost criticality. While frequency is hopefully low, the importance which is criticality * frequency is close to highest. At this stage , I am re-doing this procedure step by step with additional breakpoints to see where it fails. I already identify issues with the procedure: 1/ what user should we run the recovery commands from: root, centreon, or any user ? 2/ Restore process > Restore process is divided in two main steps: > > Re-install the Centreon platform following the installation documentation. Do not forget to upgrade system. > Restore Centreon-Engines configuration files and Centreon databases > Well, I think this is a wrong start: as only configuration files and the DB are backed-up, the recovery procedure should be the failing system, on which we would restore what was backed up. That shortens strongly the incident recovery time. As Centreon system is meant for ops people, in a perfect enough world THEY should define where is the recovery starting point. 3/ with this procedure, centreon_storage.hosts is wrongly restored to centreon.hosts (same with all centreon_storage schema tables). i am doing teh procedure step by step to identify where the problem is. Will post te results here.
kitatek commented 5 years ago

Procedure is there: https://documentation.centreon.com/docs/centreon-backup/en/latest/Restore_of_Centreon_central_server_.html

And github: https://github.com/centreon/centreon/blob/2.8.x/doc/en/administration_guide/backup.rst

kitatek commented 5 years ago

mysql> GRANT ALL ON centreon.* TO 'centreon'@'adresseipserveurcentreon' IDENTIFIED BY 'password' ;

4/ Please note that for 'adresseipserveurcentreon' ,localhost and 127.0.0.1 behave differently on mysql (TCP/IP instead of socket connection, causing DB opening failure, maybe my problem, not sure yet): so please confirm where to fetch the right IP address in the SQL or configuration recovery files before this step, is it here in sql file: `

INSERT INTO 'nagios_server' VALUES (1,'Central','1',1,1564942194,'127.0.0.1','1','0','centengine',NULL

`

kitatek commented 5 years ago

5/

cannot access /etc/init.d/centstorage: No such file or directory

kitatek commented 5 years ago

6/ Issues with the code in this procedure: can not copy lines with triple-click due to mysql prompt.

I think this might be what caused my problem: just one line failing in mysql procedure (create centreon_storage). Not seeing it due to long hours under stress during recovery. Resulting in centreon_storage tables inserted in centreon schema by mistake (I had to find a login command that worked, and that was to centreon schema...).

7/ No versioning offered in the published procedure: only latest is published, while every version is available on github.

Conclusion:

  1. I could recover Yeeehaaa ! (after so many night hours though, I don't dare to tell how many-well, I documented too ;) )
  2. Many fixes needed in the procedure still, see the numbered list above
  3. that calls for a test of this procedure
  4. This also leads to the procedure for more useful use cases:

Thanks (for others readers) if you can have those glitches fixed (that's why I do NOT close the issue).

And for centreon which is overall a very good product !

lpinsivy commented 5 years ago

Hi @kitatek which restore procedure du you used? From which Centreon version to which restore Centreon version?

Regards,

kitatek commented 5 years ago

Hi @kitatek which restore procedure du you used?

Procedure is there: https://documentation.centreon.com/docs/centreon-backup/en/latest/Restore_of_Centreon_central_server_.html

From which Centreon version to which restore Centreon version?

See in my first message the output of: $ rpm -qa | grep centreon

Regards, Thanks.

kitatek commented 5 years ago

Hi @kitatek which restore procedure du you used?

Procedure is there: https://documentation.centreon.com/docs/centreon-backup/en/latest/Restore_of_Centreon_central_server_.html

From which Centreon version to which restore Centreon version?

See in my first message the output of: $ rpm -qa | grep centreon

Regards, Thanks.

Actually I found v. 2.8.22 version number on the login screen.