centreon / centreon-broker

A full-featured monitoring event broker, compatible with MySQL, RRDtool, Graphite and more
Apache License 2.0
37 stars 15 forks source link

[3.0.9] Duplicate entry issue #126

Open Tomszy opened 6 years ago

Tomszy commented 6 years ago

Centreon Web version:2.8.12

Centreon Engine version:1.8.0

Centreon Broker version:3.0.9 on the broker server and every pollers are on 3.0.9 as well now

OS: Oracle Linux 7

Additional environment details :Virtual and we have dedicated broker server with 32 GB RAM and 4 core CPU.

Steps to reproduce the issue: Randomly We are getting Duplicate netry error message in the broker and the cbd sometimes stuck because of these errors. We scheduled a cronjob which is reloading the broker when this error come. This script is checking the log file. But when we reload the broker many services come back to pending status again and again and sometimes 1000+ tests come back to pending status and some poller stuck as well and need to login and restart the centengine. image

Here are some things what my collegue found last week: On the broker side image

On the mysql side

image

So it looks the issue with the broker side. Describe the results you expected:

We would like to fix and avoid these errors and we also would like to avoid services come back to pending status.

One more issue: Sometimes when I replace a template on a hosts with another one the old tests are still on the host as well next to the new tests under Monitoring > Status Details > Services. After a centengine restart they are disappearing but when the services come back to pending because of broker issue, these are deleted tests are coming back as well. So it should be a bug.

zskarman commented 6 years ago

I can reproduce the issue. If we adding any new check to centreon, and the broker try to insert the data, then always write an error to the /var/log/centreon-broker/central-broker-master.log

Error: [1506698393] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900293, group: 1): could not execute prepared statement: Duplicate entry '61117-900293-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506698393] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900289, group: 1): could not execute prepared statement: Duplicate entry '61117-900289-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506698393] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900321, group: 1): could not execute prepared statement: Duplicate entry '61117-900321-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506698393] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900341, group: 1): could not execute prepared statement: Duplicate entry '61117-900341-1' for key 'host_id' QMYSQL3: Unable to execute statement

Those hostid and service id are the new test ids.

It looks when the broker get new checks, which are not in the centreon_storage.services_servicegroups then generate this error. This is the part of the debug log:

[1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900277, group: 9): could not execute prepared statement: Duplicate entry '61117-900277-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1415 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900285, group: 9): could not execute prepared statement: Duplicate entry '61117-900285-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1416 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900273, group: 9): could not execute prepared statement: Duplicate entry '61117-900273-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1417 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900269, group: 9): could not execute prepared statement: Duplicate entry '61117-900269-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1418 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 1437785, group: 9): could not execute prepared statement: Duplicate entry '61117-1437785-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1419 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900281, group: 9): could not execute prepared statement: Duplicate entry '61117-900281-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1420 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 1374189, group: 9): could not execute prepared statement: Duplicate entry '61117-1374189-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1421 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900265, group: 9): could not execute prepared statement: Duplicate entry '61117-900265-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1422 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900261, group: 9): could not execute prepared statement: Duplicate entry '61117-900261-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1423 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900257, group: 9): could not execute prepared statement: Duplicate entry '61117-900257-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1424 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900253, group: 9): could not execute prepared statement: Duplicate entry '61117-900253-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1425 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900365, group: 9): could not execute prepared statement: Duplicate entry '61117-900365-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1426 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900361, group: 9): could not execute prepared statement: Duplicate entry '61117-900361-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1427 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900357, group: 9): could not execute prepared statement: Duplicate entry '61117-900357-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1428 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900353, group: 9): could not execute prepared statement: Duplicate entry '61117-900353-9' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1429 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] debug: SQL: 1430 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 914161, group: 1): could not execute prepared statement: Duplicate entry '61117-914161-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1431 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] debug: failover: reading event from endpoint 'centreon-broker-master-rrd' [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'centreon-broker-master-rrd' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900329, group: 1): could not execute prepared statement: Duplicate entry '61117-900329-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1432 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] error: SQL: could not store service group membership (poller: 1, host: 61117, service: 900349, group: 1): could not execute prepared statement: Duplicate entry '61117-900349-1' for key 'host_id' QMYSQL3: Unable to execute statement [1506697770] debug: SQL: 1433 events have not yet been acknowledged [1506697770] debug: multiplexing: acknowledging 0 events from central-broker-master-sql event queue [1506697770] debug: failover: reading event from multiplexing engine for endpoint 'central-broker-master-sql' [1506697770] debug: failover: writing event of multiplexing engine to endpoint 'central-broker-master-sql' [1506697770] debug: compression: 0x7f290419d170 compressed 28 bytes to 38 bytes (level -1) [1506697770] debug: file: write request of 42 bytes for '/var/centreon_failover//central-broker-master.queue.centreon-broker-master-rrd' [1506697770] debug: storage: loaded index 1226601 of (17897, 427513) [1506697770] debug: BBDO: serialized event of type 196613 to 28 bytes [1506697770] debug: compression: 0x7f290419d170 compressed 28 bytes to 38 bytes (level -1) [1506697770] debug: file: write request of 42 bytes for '/var/centreon_failover//central-broker-master.queue.centreon-broker-master-rrd' [1506697770] debug: storage: loaded index 1226609 of (17897, 427521) [1506697770] debug: BBDO: serialized event of type 196613 to 28 bytes

Regards, Zsolt

zskarman commented 6 years ago

Hi,

Could you please check this error?

Regards, Zsolt

bouda1 commented 6 years ago

Hi,

I'm trying to reproduce what you said here. Surely I do not understand all and I cannot reproduce the problem for now. If you can detail a procedure, you're welcome :-) Then, I will be able to dig into the code and find a solution :-)

Regards, David.

zskarman commented 6 years ago

Hi,

  1. Delete from centreon_storage.services_servicegroup
  2. restart broker
  3. check the log: /var/log/centreon-broker/central-broker-master.log

I can show you in webex if that helps...

Regards, Zsolt

bouda1 commented 6 years ago

@Tomszy if you have this error: [1507215630] error: SQL: could not store service group membership (poller: 1, host: 23, service: 184, group: 2): could not execute prepared statement: Duplicate entry '23-184-2' for key 'host_id' QMYSQL3: Unable to execute statement, it is just because the row already exists - but you must look at the good table that is centreon_storage.services_servicegroups and not centreon.servicegroup_relation. In my case : MariaDB [centreon_storage]> select * from services_servicegroups where host_id=23 and service_id=184 and servicegroup_id=2; +---------+------------+-----------------+ | host_id | service_id | servicegroup_id | +---------+------------+-----------------+ | 23 | 184 | 2 | +---------+------------+-----------------+

The question now is why sending so many insert up to block cbd ? We continue to work on it. Regards, David.

zskarman commented 6 years ago

Hi,

I deleted all of the entries: Delete from centreon_storage.services_servicegroups Then i get the errors.

1.Delete from centreon_storage.services_servicegroups
2.restart broker
3.check the log: /var/log/centreon-broker/central-broker-master.log

Regards, Zsolt

zskarman commented 6 years ago

Hi Team,

Could I get any update on this?

Regards, Zsolt

BenoitPoulet commented 6 years ago

Hello,

Same issue here :

[1508342842] error:   SQL: could not store host group membership (poller: 1, host: 7541, group: 549): could not execute prepared
 statement: Duplicate entry '7541-549' for key 'host_id' QMYSQL3: Unable to execute statement
[1508342842] error:   SQL: could not store service group membership (poller: 1, host: 7541, service: 29213, group: 1): could not
 execute prepared statement: Duplicate entry '7541-29213-1' for key 'host_id' QMYSQL3: Unable to execute statement

When moving a host from a poller to another.

zskarman commented 6 years ago

Hi,

And i got when adding new host / checks as well.

Regards, Zsolt

bouda1 commented 6 years ago

Good, thank you all.

Tomszy commented 6 years ago

Any update regarding this issue?

zskarman commented 6 years ago

Any update regarding this issue?

zskarman commented 6 years ago

Any update regarding this issue?

Tomszy commented 6 years ago

Any update regarding this issue? We are using 3.0.11 broker and we have still this issue. The broker always stucking after a duplicate entry. 1 thing. We got this duplicate entry issue after we are sending a new config to a poller and we reload the centengine.

deccard commented 6 years ago

I can confirm this issue for us as well, broker version is 3.0.13. I blame retention files which are not cleanly deleted after being processed. We can see following messages in log file when broker is restarted or reloaded or after a predefined retry period is elapsed. So broker is continuously trying to re-insert already processed queries.

Duplicate entry ... for key 'host_id' QMYSQL3: Unable to execute statement

We were following these guidelines but problem still persists: Best practices Manage Centreon Broker link failures

please see following example! https://goo.gl/RoSL7t

zskarman commented 6 years ago

Any update regarding this issue?

ganoze commented 6 years ago

First let me say that with the latest Centreon Broker versions the SQL: could not store service group membership error if perfectly harmless. This error, nor the host group variant, do not cause Centreon Broker to stop processing events.

From what I understand, the most probable reason for this behavior is that you configured manually retention files on Centreon Broker endpoints. However this is not needed and indeed harmful if configured. I sent PR #172 to update the best practices guide.

@BenoitPoulet When this error is triggered when you move a host from a poller to another, then this might indicate that pollers were not reloaded/restarted in the proper order. The old poller should be reloaded first, then the new one.

Let me know if that helps.

zskarman commented 6 years ago

Hi,

We use this setup.

image

The failover should eq with failover name field?

Rgs, Zsolt

ganoze commented 6 years ago

With Centreon Broker 3, manual failover files should be totally removed (Broker handle it itself). So you should remove the file output (Output 2 - File) entirely, and leave the failover field of the base output empty.

zskarman commented 6 years ago

Hi,

this means we can remove all failover from the central broker and the pollers as well?

Rgs, Zsolt

deccard commented 6 years ago

Ah, thanks Matthieu for clarifying broker 3.x failover setup. I was already unsure because I was reading about automatic failover handling in release notes but there was also this best practices guide. I'll keep watching our broker instances and report if something's unusual.

zskarman commented 6 years ago

Hi,

So if i remove the *retention "failover" files from the central broker and pollers, then where is store the central broker if it cannot be handle the load like this one:

image

We have > 293.000 checks and we always have some delays in graphs. image

Regards, Zsolt

zskarman commented 6 years ago

Hi,

this means we can remove all failover from the central broker and the pollers as well?

Rgs, Zsolt

ganoze commented 6 years ago

@zskarman Yes you can remove all failover files from all your Centreon Broker configurations (central and poller).

Automatic retention files are stored in the cache directory (configurable from the interface). It is usually /var/lib/centreon-broker.

zskarman commented 6 years ago

Hi,

 

Thank you for your help, we started to remove the failover files, and let’s see if we can get any duplicate errors.

 

Regards, Zsolt

 

From: Matthieu Kermagoret [mailto:notifications@github.com] Sent: 2018. április 4., szerda 8:41 To: centreon/centreon-broker centreon-broker@noreply.github.com Cc: zskarman zsolt.karman@oracle.com; Mention mention@noreply.github.com Subject: Re: [centreon/centreon-broker] [3.0.9] Duplicate entry issue (#126)

 

HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zskarman&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=9VD6G_70Xx-RZr489f74x-Yhzc8NQ3nVw0y-ZDbOJSI&m=YcFUroRidcksvVZwDFMNShGzFU3QBecbdKPqtV3vC9A&s=FqgAvmZF93JZFKBvtqxznkbZ4h6x0XiGtO1RD5znyqk&e="@zskarman Yes you can remove all failover files from all your Centreon Broker configurations (central and poller).

Automatic retention files are stored in the cache directory (configurable from the interface). It is usually /var/lib/centreon-broker.

— You are receiving this because you were mentioned. Reply to this email directly, HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_centreon_centreon-2Dbroker_issues_126-23issuecomment-2D378496637&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=9VD6G_70Xx-RZr489f74x-Yhzc8NQ3nVw0y-ZDbOJSI&m=YcFUroRidcksvVZwDFMNShGzFU3QBecbdKPqtV3vC9A&s=Tb003siNrwVjA0NicGI8X4DlFm_QEWwmjej5pKGAXAE&e="view it on GitHub, or HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AXMY2n2oiediHBBpcirg03lYytuoCXheks5tlGrhgaJpZM4Pip7x&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=9VD6G_70Xx-RZr489f74x-Yhzc8NQ3nVw0y-ZDbOJSI&m=YcFUroRidcksvVZwDFMNShGzFU3QBecbdKPqtV3vC9A&s=QBoWiKQwxL18H5qxQ1FGh6L8q-XOtnNhGNsioewxvZE&e="mute the thread. https://github.com/notifications/beacon/AXMY2g8A60w5miL2TCCC9bukNszBY3osks5tlGrhgaJpZM4Pip7x.gif

scitechff commented 6 years ago

Hello,

i've noticed that the particular centreon services are not deleting the failover files anymore. Neither on the poller nor the central server. By googling the problem i've found this issue. I've tested the advise to remove all manual failover configurations files for one poller and failover is still working. Unfortunately, i do not find any retention files on my poller. The cache directory for the poller's broker is configured as described.

My setup includes the following components: OS: CentOS7 centreon-broker: 3.0.13 centreon-engine 1.8.1 centreon-web: 2.8.19

For what do i need to look, in order to find out how and where my poller saves the failover events?

I've also encountered the issue described in issue 5351. Without the manual failover configuration the error disappeared. I guess the poller broker sends the queued failover events twice to the central server. First for the automatic failover and a second time for the manual. Hence rrdcached tries to update the corresponding rrd files twice. Is that correct?

ganoze commented 6 years ago

@scitechff The retention files are written in the cache directory only if the event queue max size is reached. Below this threshold events are kept in memory.

For your graph problem please open another issue to keep this one focused on the original error.

dgarandel commented 6 years ago

Hello, I had the same problem that you describe in this post. Poller freeze, stuck after a reload with "duplicate entry" on centreon-broker log file. We have 2 frontal master server. This case appear each time when connected to the second master server (behind a loadbalancer). We can find 2 "duplicate entry" errors in centreon-broker log file.

But since 1 month error always appear but Centreon work fine and no freeze, suck since. The only action was to add RAM in Centreon DB server.

./mysqltuner.pl

[OK] Maximum possible memory usage: 5.4G (69% of installed RAM)

Before I had 120% of installed RAM.

TimoKramer commented 6 years ago

I am having a similar issue. No clustering. Any good idea how to resolve that?