elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

Make Journald input resilient to Journald errors #39355

Open belimawr opened 4 months ago

belimawr commented 4 months ago

Currently if there is any error reading the next message from Journald, the input will stop working and never recover, effectively stopping ingestion and never recovering.

This happens because any error reading a new message or publishing a message https://github.com/elastic/beats/blob/ffcd1814666645a5d7a644911ecf6e2b7d8db3f5/filebeat/input/journald/input.go#L163-L173 is returned by the Run method that was called in a goroutine that logs it and then exits https://github.com/elastic/beats/blob/ffcd1814666645a5d7a644911ecf6e2b7d8db3f5/filebeat/input/v2/compat/compat.go#L119-L135

We need to make the Journald input more resilient to errors we get when calling the host's journald via github.com/coreos/go-systemd/v22/sdjournal.

elasticmachine commented 4 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

belimawr commented 1 month ago

Even after the merge of https://github.com/elastic/beats/pull/40061 and the migration to using journalctl this issue is still relevant, if journalctl crashes the input finishes and the ingestion of journal messages stops.