Open akshay-wani opened 1 year ago
I don't know why this happened. I would check if Fluent Bit is still reading and ingesting new data or not.
I'd enable debug logging: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#enable-debug-logging
We have a log loss runbook with steps and queries you can run to check if Fluent Bit is ingesting data, and it requires you enable debug logging first: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#log-loss-investigation-runbook
Debug logging is enabled. The log I have pasted is of Debug Log level. I do not see any warn
, err
or error
in FluentBit logs.
I would check if Fluent Bit is still reading and ingesting new data or not.
--> I'm not sure about it as FluentBit logs are only showing [debug] [output:s3:s3.0] Running upload timer callback (cb_s3_upload)..
message in logs after the issue occurs.
One thing I noticed for public.ecr.aws/aws-observability/aws-for-fluent-bit
image, I tried checking FluentBit version using ./fluent-bit --version
, it shows v1.9.10
only for all images. I checked for images with stable
and latest
tags. Although the Git commit hash is different.
I also tried checking the git commits hash for versions in https://github.com/aws/aws-for-fluent-bit/tree/mainline repo but not found it.
@akshay-wani Can you check the AWS version file as explained here: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#which-version-did-i-deploy
Debug logging is enabled. The log I have pasted is of Debug Log level. I do not see any warn, err or error in FluentBit logs.
I would check if Fluent Bit is still reading and ingesting new data or not. --> I'm not sure about it as FluentBit logs are only showing [debug] [output:s3:s3.0] Running upload timer callback (cb_s3_upload).. message in logs after the issue occurs.
If there is your full debug log output, then per the log loss runbook, since we do not see any of the ingestion messages, Fluent Bit is not ingesting any new data.
I'm not sure why that would happen if your pods are emitting new logs to log files.
We are experiencing a similar issue where our fluentbit container occasionally becomes unresponsive. During these incidents, the container not only fails to forward logs to S3, but also ceases to generate its own logs. This behavior suggests that the process either crashes or gets halted.
There are no significant fluctuations in resource usage, and the container does not run out of memory.
~Unfortunately, there are no noteworthy logs preceding these occurrences, as we only have 'info' level and higher logs enabled.~
EDIT: we got debug logs, and they are similar to what OP posted:
2024-01-03 23:12:49 UTC | TRACE | INFO | (pkg/trace/info/stats.go:91 in LogAndResetStats) | No data received
[2024/01/03 23:12:55] [debug] [output:s3:s3.1] Running upload timer callback (cb_s3_upload)..
Could anyone provide any guidance or suggestions on this matter?
Hello,
Im facing with the same problem. Any idea?
Thanks
Hello,
I was using the aws image for fluentbit, try using official fluentbit image, that solved my issue. Image - cr.fluentbit.io/fluent/fluent-bit
Hello All,
FluentBit stops sending logs to S3 after 6-12 hrs of starting the instance. It is giving the
Running upload timer callback (cb_s3_upload)..
message continuously in its logs.Exact logs for the duration when fluentbit stops sending logs to S3 :
I'm using
fluentbit
(fluent-bit-0.37.0) official helm charts withpublic.ecr.aws/aws-observability/aws-for-fluent-bit:stable
image.Fluentbit version for
public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
isv1.9.10
I tried increasing
Buffer_Max_Size
and settingauto_retry_requests
parameter totrue
, but it still is not helping.Please suggest