counteractive / o365beat

Elastic Beat for fetching and shipping Office 365 audit events
Other
66 stars 27 forks source link

Preventing Duplicate Events #42

Open shwetas-syd opened 4 years ago

shwetas-syd commented 4 years ago

We've noticed duplication of events and we're looking at ways to prevent them. I tried adding the add_id processor, but it's not available in the list of processors.

chris-counteractive commented 4 years ago

I'd love to learn more, to differentiate whether this is a case of the beat repeating content downloads or if it's an artifact of the API itself. I'll check on the add_id processor, but the events themselves should have unique IDs already.

shwetas-syd commented 4 years ago

Debugging logs show the beat querying the artifact and publishing events from the same date range multiple times. So, I suspect the O365Beat isn't getting an acknowledge from Elastic in time. Elastic recommends the add_id processor to prevent data duplication https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html

rob570 commented 4 years ago

Hi, I think it's not an o365beat issue. This is my pipeline: o365beat-->logstash (with geo info enrichment by a filter)-->output to file and to ES I see duplicate events in the file too. I solved on ES mapping in the logstash conf the document_id to the "Id" O365 field.

chris-counteractive commented 4 years ago

I'm thinking these duplicate events could be part of the same underlying issue described in my recent reply to @rob570's issue. I'll let you know when a fix is posted, and hopefully we can test it under your conditions!