Open shwetas-syd opened 4 years ago
I'd love to learn more, to differentiate whether this is a case of the beat repeating content downloads or if it's an artifact of the API itself. I'll check on the add_id
processor, but the events themselves should have unique IDs already.
Debugging logs show the beat querying the artifact and publishing events from the same date range multiple times. So, I suspect the O365Beat isn't getting an acknowledge from Elastic in time. Elastic recommends the add_id processor to prevent data duplication https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html
Hi, I think it's not an o365beat issue. This is my pipeline: o365beat-->logstash (with geo info enrichment by a filter)-->output to file and to ES I see duplicate events in the file too. I solved on ES mapping in the logstash conf the document_id to the "Id" O365 field.
I'm thinking these duplicate events could be part of the same underlying issue described in my recent reply to @rob570's issue. I'll let you know when a fix is posted, and hopefully we can test it under your conditions!
We've noticed duplication of events and we're looking at ways to prevent them. I tried adding the add_id processor, but it's not available in the list of processors.