elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.08k stars 4.89k forks source link

[Filebeat] - validate files before harvesting #40151

Open VihasMakwana opened 2 weeks ago

VihasMakwana commented 2 weeks ago

Current Issue

Describe the enhancement:

Describe a specific use case for the enhancement or feature:

elasticmachine commented 2 weeks ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

pierrehilbert commented 2 weeks ago

@rdner / @belimawr could you please have a look here to share your thoughts?

belimawr commented 2 weeks ago

@VihasMakwana, I'm a bit confused here... We check if a file is a regular file right before opening it, which is indeed after resolving the symlink. But every single file we open for reading is will go through the validation at https://github.com/elastic/beats/blob/032a4cfd5f3b8fa8354ac1e0062a0e1f196c60d0/filebeat/input/filestream/input.go#L293-L296 which checks the file mode https://github.com/elastic/beats/blob/032a4cfd5f3b8fa8354ac1e0062a0e1f196c60d0/filebeat/input/filestream/input.go#L323-L329

So even if the fileWatcher finds a file under a symlink that is not a regular file and returns it, filestream.openFile will still validate it by calling checkFileBeforeOpening and won't be harvested if it is not a regular file.

Regarding reporting the input status as degraded in this case, I believe it is the correct behaviour. The user has configured Filestream to ingest something that is not a regular file, thus the user should be notified of their error and the input should stay degraded until this is fixed.

We just need to make sure the message returned to the user is clear enough so the can understand and act on it.