Open kbujold opened 4 months ago
This issue doesn't have a Team:<team>
label.
Hi,
The reason we fingerprint the file is that the path and name are not reliable ways to track a cursor for an input file.
Can you use an alternative identification method like the path or inode_marker methods?
Having also looked into fingerprint mode, there just isn't a perfect processor. Each method has tradeoffs. If it were possible, I would combine processors to get closer to uniqueness but filebeat does not support this.
The scenario could be quite common in a large kubernetes environment. Where multiple pods have the same startup routine (and no timestamps). Following that logic, it's in the realm of possibility that pods in different namespaces for different tenants (but running on the same node) could have the same fingerprint. Crazy things happen. :)
The filepath or inode has to be involved at some point, right? Once the contents of two log files that share the same fingerprint have diverged, which one continues to be tracked? After a filebeat pod is restarted or replaced with a new generation, which file gets tracked now?
It's a complicated problem. Being able to say "I have two files with the same fingerprint, but their initial paths are different (not their inodes), so I will treat them as different" would add value.
Hi we wanted to switch to fingerprint mode with filestream. Introducing Filestream fingerprint mode
We have found an issue with id creations. If two files have the exact same content and have two different filename, fingerprint will see these as having the same file id. Not all logs contain a timestamp on a system. This is problematic.
For example
Would it be possible have the filename be part of the id creation?
Thank you, Kris