fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.89k stars 1.34k forks source link

Use notification mechanisms for watching new files in tail plugin instead of refresh_interval #2985

Open alexandru-ersenie opened 4 years ago

alexandru-ersenie commented 4 years ago

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Is your feature request related to a problem? Please describe.

I am using TDAgent for a couple of years already inside Kubernetes, and until now it was doing a tremendous job. Suddenly i noticed that we are missing some logs, although from a log structure point of view everything was ok, and testing it using tdagent test kit was processing the logs just fine.

Since we had to do with an ephemeral container, that logged some error logs, and then died, i dug into the problem more, to only find out that the log file my container produced was only picked up by tdagent in_tail plugin about one minute later, which was definitely too late:

Here is the timeline :

Pod deployed (10:09:53)

kubectl -n area11 apply -f dbmigrate.yaml && echo `date` pod/db-migrate-9lg6q-test created Thu May 7 10:09:53 CEST 2020

Pod's first output (10:09:55) - so about 2-3 seconds later, my pod starts logging

"log":"Start flyway migration\n","stream":"stdout","time":"2020-05-07T10:09:55.904911554Z"} {"log":"2020-05-07 10:09:55 INFO - Execute_DB_Operations: DB Operation selected; Logging in /flyway_output.html\n","stream":"stdout","time":"2020-05-07T10:09:55.912142202Z"}

Eventually the in_tail picks up the file about a minute later, and starts tailing. At this point it is too late, my pod has already logged: 2020-05-07 10:10:51 +0000 [info]: #0 following tail of /var/log/containers/db-migrate-9lg6q-test_area11_db-migrate-0a73fe226edfa80ddc51f52756c4504625f185e220451b021bd9d67f2408efed.log

After looking for the problem, i noticed (that is my understanding), that the only way to deal with this would be to set the refresh_interval at a very low value (like couple of seconds).

This works, indeed, if i set this to 1-3 seconds, but why??? In clusters with a lot of files, i fear that setting xx active watches will have an impact on performance.

I just cannot understand if there is anything i am missing. I was supposing that specially for stateless apps as they exist in Kubernetes, a notification mechanism, such as unix's inotifywait would be the solution to follow.

Describe the solution you'd like

I would like to see a notification mechanism on the tail paths, which would actively notify the tail_plugin of new appeared files, therefore minimizing to 0 the change of loosing even one single log entry Describe alternatives you've considered

Additional context

repeatedly commented 4 years ago

I would like to see a notification mechanism on the tail paths, which would actively notify the tail_plugin of new appeared files, therefore minimizing to 0 the change of loosing even one single log entry

Does anyone know this? Does k8s provide API or notification mechanism for pod metadata?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

ashie commented 2 years ago

Use inotify? inotifywait in inotify-tools can watch new files in a directory like this:

inotifywait -e create -m ./ &
[1] 500205
Setting up watches.
Watches established.
touch foo
./ CREATE foo
$ touch bar
./ CREATE bar
...