Open brsolomon-deloitte opened 2 years ago
What if you up the log_level
in the service configuration? This is how I generally debug file permission or mount point errors (e.g. symlinks to a non-existent mount point). Similarly for issues in other plugins, first step is to up the logging level to see what is happening in more detail.
Fluent Bit asks for the list of files it is allowed to see and gets those it is allowed to see. How would it know there are other files to see? What if there was a mix of permissions so it got some files but not others, how would it know it did not have all? I don't think the input plugin could or should be able to tell you this, it can merely tell you what it is told/allowed to see. It does not know there is an issue so it cannot report an issue.
The overall log level can give you that extra information though - we don't want the logs to be full of extra debug either when it is running with a correct configuration. Some log files may be deliberately excluded as well from selection (e.g. by time of last update) so I'm not sure there is a general check that could be done to pick up missed files, particularly when you should not even be able to know of their existence from a security perspective.
However, if you have a suggestion for how we could detect this specific failure generally then that would be ace! :+1:
In the example above, fluent-bit is given a tail
input for /var/log/containers/*.log
. (This is ultimately a symlink to /var/log/
.) It seems to me that a reasonable behavior would be to warn about unreadable files that match the input glob or exact file path, at the default log level.
In this case that seems like it would be quite possible to enabled. /var/log
is mode 0755, owned by root:root
, and the various /var/log/*.log
are mode 0600, owned by root:root
. A non-root user can detect that those files exist because of the directory's x
bit, but also discern that it is prohibited from reading them, e.g. stat /var/log/foo.log
succeeds while cat /var/log/foo.log
or test -r /var/log/foo.log
fails. Having to turn on a more verbose logging level above the default doesn't seem like it should be necessary. It seems sensical that not being able to read a file that fluent-bit has been told specifically to read through a glob seems like something that should be logged at the INFO level.
Ah right, in this specific case there is a test that could be made for it. I guess there may be additional performance concerns on these checks at scale (e.g. when thousands of rapidly rotating log files or other pathological cases) but that is something we can test and/or have a configuration option for potentially.
Please submit a PR to cover the changes, at the moment the current guidance is using the existing additional logging to detect it which is not perfect as you say.
Ah right, in this specific case there is a test that could be made for it.
Per your original response, isn't there already a check happening, but one that is logged at a more verbose log level but not at the default log level? My proposal here would be to emanate a warning for this check at the default log level.
Possibly related to #2526 although the use cases may be the opposite: that one wants to reduce log noise when folders are empty and this wants to trigger more logs on misconfiguration.
Bug Report
Describe the bug
fluent-bit fails to emanate any error, warning, or message at all, if the file(s) in
tail
are not readable by the invoking user.To Reproduce
Use IronBank fluent-bit Docker image and official fluent-bit helm chart. Helm chart parameter for image:
for Kubernetes daemonset:
Expected behavior
Give me any indication at all there is a problem.
Your Environment
Full Helm config from charts/fluent-bit/values.yaml:
Now exec into container:
Then:
As shown here the IronBank image uses a user of
fluent
while /var/log files are only readable by root. Sure, one fix is the correct the read permission or owner itself. However, the purpose of this issue is to show that fluent-bit doesn't produce any useful logs to point out this error. Pods are healthy and this is the only log i get: