elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.08k stars 4.89k forks source link

Rate limit "Cannot index event" log messages #40157

Open cmacknz opened 2 weeks ago

cmacknz commented 2 weeks ago

https://github.com/elastic/beats/blob/032a4cfd5f3b8fa8354ac1e0062a0e1f196c60d0/libbeat/outputs/elasticsearch/client.go#L487-L491

The "Cannot index event" logs messages are a useful signal in the logs that events are being dropped and (as of 8.15.0) you should look at the local event log for the reason.

Since this log message does not contain any useful debugging information, and has the potential to be generated for every event that flows through the pipeline, there is no value in logging it for each event.

Instead we should rate limit it so that it only appears once in a fixed interval when events are being dropped. The rate limit is initially proposed to be one message every 10 seconds.

The rate limited message should include the number of events that dropped in the current interval. The message can be changed to something like "Failed to index N events in last M seconds. Look at the event log to view the events and cause."

elasticmachine commented 2 weeks ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

jlind23 commented 2 weeks ago

@pierrehilbert bumping the priority on this one as it recently had an impact on some users. cc @lucabelluccini @nimarezainia