Open wackou72 opened 4 years ago
Hi @wackou72 ,
Do you find errors in /var/log/centreon-gorgone/gorgoned.log?
Hello @lpinsivy ,
I didn't see any error, just INFO
But I notice something, their is a ping happening regurlaly and a pong from the other servers (I assume)
The log since monday morning show this, and yesterday it was the last one :
2020-08-11 12:36:39 - INFO - [proxy] Send pings 2020-08-11 12:36:40 - INFO - [proxy] Received setlogs for '4' 2020-08-11 12:36:40 - INFO - [proxy] Pong received from '4' 2020-08-11 12:36:40 - INFO - [proxy] Received setlogs for '5' 2020-08-11 12:36:40 - INFO - [proxy] Pong received from '5' 2020-08-11 12:36:40 - INFO - [proxy] Received setlogs for '3' 2020-08-11 12:36:40 - INFO - [proxy] Pong received from '3' 2020-08-11 12:36:40 - INFO - [proxy] Received setlogs for '2' 2020-08-11 12:36:40 - INFO - [proxy] Pong received from '2'
Now I see only
2020-08-11 16:16:14 - INFO - [proxy] Send pings 2020-08-11 16:17:34 - INFO - [proxy] Send pings 2020-08-11 16:18:54 - INFO - [proxy] Send pings
Without any pong answer.
I do not restart the Central service this morning (systemctl restart cbd centengine gorgoned
)
I also know that something went wrong because my scheduled downtimes on the distant poller are not working.
Please note that the communication of the hosts/services are working :
So you have an issue for Centreon Gorgone communication (from central to pollers) bu everything is ok when Centreon Engine forward collected data to database using Centreon Broker.
Can you check the status of 'gorgoned' service on your pollers and the associated /var/log/centreon-gorgone/gorgoned.log?
Service gorgoned is up and urnning on all my servers.
Here is the log of /var/log/centreon-gorgone/gorgoned.log
on my remote servers
Strangely, the log stop exactly at the same time (all 4 remote servers are on different TimeZone)
2020-08-11 14:44:17 - INFO - [action] Copy processing - Received chunk for '/etc/centreon-engine//' 2020-08-11 14:44:17 - INFO - [action] Copy processing - Copy to '/etc/centreon-engine//' finished successfully 2020-08-11 14:44:17 - INFO - [action] Copy processing - Received chunk for '/etc/centreon-broker/' 2020-08-11 14:44:17 - INFO - [action] Copy processing - Copy to '/etc/centreon-broker/' finished successfully 2020-08-11 14:44:45 - INFO - [action] Copy processing - Received chunk for '/etc/centreon-engine//' 2020-08-11 14:44:45 - INFO - [action] Copy processing - Copy to '/etc/centreon-engine//' finished successfully 2020-08-11 14:44:45 - INFO - [action] Copy processing - Received chunk for '/etc/centreon-broker/' 2020-08-11 14:44:45 - INFO - [action] Copy processing - Copy to '/etc/centreon-broker/' finished successfully
Hi @wackou72 can you update to latest version of gorgone (20.04.4) on Central server and restart gorgoned process?
Regards,
Hi @lpinsivy
I upgraded gorgone to 20.04.4 and restart all my servers.
I will let you know if I encounter the issue and post the log of /var/log/centreon-gorgone/gorgoned.log
Unfortunately, that didn't solve the issue
Here is the last message :
2020-08-19 00:55:00 - INFO - [proxy] Send pings
Starting now, I can't apply new configuration, restart the remote pollers, can't force immediate check and Scheduled Downtime doesn't work until I run systemctl restart cbd centengine gorgoned
Once run, the log and the ping is working again :
2020-08-19 13:07:02 - INFO - [proxy] Create module 'proxy' child process for pool id '1' 2020-08-19 13:07:02 - INFO - [proxy] Create module 'proxy' child process for pool id '2' 2020-08-19 13:07:02 - INFO - [proxy] Create module 'proxy' child process for pool id '3' 2020-08-19 13:07:02 - INFO - [proxy] Create module 'proxy' child process for pool id '4' 2020-08-19 13:07:02 - INFO - [proxy] Create module 'proxy' child process for pool id '5' 2020-08-19 13:07:02 - INFO - [core] Setcoreid changed 1 2020-08-19 13:07:02 - INFO - [proxy] Node '2' is registered 2020-08-19 13:07:02 - INFO - [proxy] Node '3' is registered 2020-08-19 13:07:02 - INFO - [proxy] Node '4' is registered 2020-08-19 13:07:02 - INFO - [proxy] Node '5' is registered 2020-08-19 13:07:03 - INFO - [zmqclient] Client connected successfully to 'tcp://1.1.1.1:5556' 2020-08-19 13:07:04 - INFO - [zmqclient] Client connected successfully to 'tcp://2.2.2.2:5556' 2020-08-19 13:07:04 - INFO - [proxy] Pong received from '2' 2020-08-19 13:07:05 - INFO - [zmqclient] Client connected successfully to 'tcp://3.3.3.3:5556' 2020-08-19 13:07:05 - INFO - [zmqclient] Client connected successfully to 'tcp://4.4.4.4:5556' 2020-08-19 13:07:05 - INFO - [proxy] Pong received from '4' 2020-08-19 13:07:05 - INFO - [proxy] Pong received from '3' 2020-08-19 13:07:05 - INFO - [proxy] Pong received from '5' 2020-08-19 13:07:20 - INFO - [proxy] Send pings 2020-08-19 13:07:20 - INFO - [proxy] Pong received from '4' 2020-08-19 13:07:20 - INFO - [proxy] Pong received from '5' 2020-08-19 13:07:21 - INFO - [proxy] Pong received from '3' 2020-08-19 13:07:21 - INFO - [proxy] Pong received from '2'
Let me know if you need the full log of the Central and/or the remote pollers and if I need to enable something to have all the debuging message
When you try to export configuration or re-schedule a command, what is the result on gorgoned.log?
Regards,
here are the output :
Hi @wackou72,
Thanks for the info.
Can you provide us the full log of the Gorgone on the Central from the last restart to the time it starts failing ?
Can you do that with debug level ? You can activate debug from 'Administration > Parameters > Debug'. You'll need to restart gorgoned
to apply it.
Hello @cgagnaire Should I enable everything ?
Hi @wackou72, No, only the Centreon Gorgone debug.
Hi @cgagnaire Ok I set the debug mode as requested and restart everything. I will let you know once I got the defect
Hi @cgagnaire Where I can sent you the file ? Their is server name etc and I want this to be private. Regards
Hi @wackou72, Send it to the email address of my account.
Hello
I have the exact same issue right now with the same behavior, it work some times (one hour to one day) and i have to restart my master process to make external command & acknowledge worked again.
I already put Gorgone on debug, i can give you some additionnal input in this screenshot, when it started to not working :
Do you need other logs ?
Regards,
Hi @Midorip,
Can you try the latest version of Gorgone from unstable repository:
yum update centreon-gorgone\* --enablerepo=centreon-unstable*
You can rollback to latest stable with a downgrade command in case of problem:
yum downgrade centreon-gorgone\*
Hello @cgagnaire and @lpinsivy Did this issue has been solved ? I managed to solve the issue with the unstable repo, when it will be sync to the stable branch ? Regards.
Versions
Centreon 20.04.4
For the RPM based systems
Operating System
CentOS 7.8.2003
Browser used
Description
After a certain amount of time (2 to 3 days), communication between the Central and Distant Poller is broken. The Central get the informations of services/hosts but Scheduled Downtime and applying new configuration doesn't work Running
systemctl restart cbd centengine gorgoned
and verything is back to normal. Please note that I've updated to 20.04.4 and switch to the ZMQ protocol. Communication works for a certain amount of time.Steps to Reproduce
Configuration Pollers --> Export Configurations
Describe the received result
No configuration applied
Describe the expected result
New configuration should be applied
Logs
Let me know which logs is needed
Additional relevant information (e.g. frequency, ...)
I tried to look at different logs files and I don't see any error or defect. Look like https://github.com/centreon/centreon/issues/8799