Closed reverson-farfetch closed 1 year ago
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
This issue was closed because it has been stalled for 5 days with no activity.
We have the same issue in version:1.9.5
@reverson-farfetch @woodliu were you able to figure out the fix? We experience the same issue in 1.9.5
@reverson-farfetch @woodliu were you able to figure out the fix? We experience the same issue in 1.9.5
Just update to 1.9.6
Bug Report
Describe the bug (Sorry in advance for the long post) For some reason, fluent-bit stays like a "sleep", stopping process logs through the tail plugin in Kubernetes. Which I could check when the files that were tailing by plugin were deleted (by rotation or when the pod is deleted), fluent-bit keep running, I can see metrics, I can check the health check responding but no logs are processing.
Looking at the metrics
fluentbit_output_proc_bytes_total
andfluentbit_input_bytes_total
those are with 0 value and fluent-bit stop to logging itself.Two weird things:
Regarding when the strace command, here are the last lines of the fluent-bit pod that istoppeded process:
So, I created an ephemeral container in this pod and check the file descriptor for the fluent-bit process. All file above are files that were rotated or the pods were deleted:
After that I ran the strace command, there are the starts lines from de command:
full file strace.log
After the strace command had been executed, the logs started to be processed again. Here is the log right after the strace command was executed. We can see the link rotation for files had been executed and the files started processing:
full file fluenti-bit.log When I go to check the metrics, it's showing that the logs are being processing and I can see them in my log centralized server. It looks like the fluent-bit stops processing when the log file is deleted. When the process is resuming it starts to process again.
Any idea what it could be? Or what I can do to fix it?
To Reproduce Unfortunately, I couldn't reproduce it locally or in my dev environment. It only happen in the prod environment I ensured that the dev has the same configuration as prod, but I couldn't reproduce it.
Expected behavior Even when the log files are deleted the fluent-bit can continue processing the logs. Or if not, through the liveness probe this is seen and the container is restarted.
Screenshots The metrics are in UTC and the logs UTC +1.
Your Environment
Version used: fluent/fluent-bit:1.9.5
Configuration:
full config fluent-bit.txt
Environment name and version (e.g. Kubernetes? What version?): v1.23.8 and v1.21.9
Server type and version: Azure AKS
Operating System and version: Ubuntu
Filters and plugins: I shared before.
Additional context To keep some security directve we rebuild the image to upgrade some packages with CVEs. Here is the Dockerfile
DEBIAN_DISTROLESS: distroless/cc-debian11:latest
FROM_RUNTIME: fluent/fluent-bit:1.9.5
FROM_INSTALLER: debian:bullseye-slim
full dockerfile Dockerfile.txt