elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.13k stars 4.91k forks source link

Use fingerprint file identity by default and migrate all existing filestream inputs to it #40197

Open belimawr opened 2 months ago

belimawr commented 2 months ago

We started seeing cases where the filestream-monitoring will log an entry stating the Elastic-Agent log file was truncated:

{
  "log.level": "info",
  "@timestamp": "2024-07-05T15:42:27.135Z",
  "message": "File was truncated. Reading file from offset 0. Path=/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-monitoring"
  },
  "service.name": "filebeat",
  "id": "filestream-monitoring-agent",
  "path": "/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "ecs.version": "1.6.0",
  "log.logger": "input.filestream",
  "log.origin": {
    "file.line": 300,
    "file.name": "filestream/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).openFile"
  },
  "source_file": "filestream::filestream-monitoring-agent::native::524773-66308",
  "state-id": "native::524423-66428"
}

We know the Elastic-Agent does not truncate files, the only reason for Filebeat to detect a truncation is if inodes are re-used.

The best way to prevent inode reuse from affecting the monitoring logs is to use a file identity that won't be re-used, the best option for that is the fingerprint file identity.

We should either start using the fingerprint file identity or at least make it configurable so users facing this issue can work around the inode reuse.

One very important thing to bear in mind is that if we change the file identity, all existing files will be considered new and re-ingested.

cmacknz commented 1 month ago

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period? That is the configured file identity becomes used for new events, but we can still deal with old events?

This would let us switch to fingerprint everywhere without concern about data duplication.

CC @rdner

belimawr commented 1 month ago

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

At a conceptual level, no there isn't a reason.

The file identity is part of the key we use in the registry, we have some abstractions to generate the key from a "source", in simple terms it concatenates: input type, input ID, file identity and the actual identifier for the file, e.g: filestream::filestream-monitoring-agent::native::524042-42000. The parts of the code responsible for generating registry key and fetching data from the registry are not really connected. When fetching an entry from the registry we need to know the file identity that was used to generate the key.

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

After the migration we can proceed with a normal execution.

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period?

Not without re-writing the whole registry and interactions with it, but I believe the migration I mentioned above will work well.

ycombinator commented 1 month ago

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

@belimawr Please correct me if I'm wrong but the above seems to be the key migration work that needs to be implemented in Beats to resolve this issue? If so, we can transfer this issue to the Beats repository.

cmacknz commented 1 month ago

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

belimawr commented 1 month ago

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

What would be "all filestream inputs" here? All started by the Elastic-Agent? By integrations? Or it becomes more focused on implementing the option to switch file identity without losing the state?

cmacknz commented 1 month ago

It means all of them regardless of whether they run in Beats or Elastic Agent, unless there is a reason we think non-fingerprint file identities are superior in some cases.

This would perhaps be better titled, "Use fingerprint file identity by default and migrate all existing filestream inputs to it".