elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.17k stars 4.92k forks source link

Make the shipper output to acknowledge batches based on `PersistedIndex` not `AcceptedCount` #32329

Closed rdner closed 2 years ago

rdner commented 2 years ago

Describe the enhancement:

Currently, the shipper is acknowledging the event batch based on AcceptedCount coming from the shipper as a part of the PublishReply.

This is unsafe because AcceptedCount does not mean that the actual final output (e.g. Elasticsearch) acknowledged these events from the Shipper's queue, it just means that the shipper put it in its own queue for processing. So, the data can be lost if the shipper process dies.

To fix this we need to use PersistedIndex from PublishReply or a separate PersistedIndex endpoint to acknowledge the events.

elasticmachine commented 2 years ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

cmacknz commented 2 years ago

The primary use case we need to support is ensuring inputs like filebeat only update their persistent registry when events have been accepted by the output.

Filebeat's events should only be acknowledged based on the shipper's persisted index. Ideally we can create test cases proving that acknowledgements work correctly when either filebeat or the shipper restarts unexpectedly.