fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.73k stars 1.56k forks source link

Doesn't remove .flb files after some problem with reciever #1994

Closed xobot42 closed 2 years ago

xobot42 commented 4 years ago

Bug Report

Hi. I have some problem with fluentbit with tail input plugin. We use k8s and fluentbit as sender for logs. And send logs to kafka. Sometimes when we have any problems with kafka fluentbit can't send logs there and start collect them into /var/log/fluentbit/flb-storage/tail. When kafka start work again, fluentbit still can't send logs to it. And after few hours CrashBackLoopOff happend, pod can't restart, because /var/log/fluentbit/flb-storage/tail. has a huge size. We have to remove all files into /var/log/fluentbit/flb-storage/tail.* and delete pods, after that pods start work normally

Expected behavior After kafka come back fluentbit boost all collected filesr problem.

My Environment

edsiper commented 4 years ago

Hi,

Some storage fixes are in place in the recent versions of Fluent Bit, please upgrade your image to the latest fluent/fluent-bit:1.3.9 and send us some feedback.

xobot42 commented 4 years ago

Thanks. I tried fluent/fluent-bit:1.3.9 and can made that my pod started without crashing. But files still doesn't send to kafka and still doesn't remove from worker-node. I found some similar issue on github, but nothing can helped me. In log I have:

 [2020/03/03 15:39:01] [debug] [storage] tail.0:1-1583249941.481383859.flb mapped OK
[2020/03/03 15:39:01] [debug] [storage] [cio file] synced at: tail.0/1-1583249941.481383859.flb
[2020/03/03 15:39:01] [debug] [storage] tail.0:1-1583249941.481383859.flb mapped OK
[2020/03/03 15:39:01] [debug] [storage] [cio file] synced at: tail.0/1-1583249941.481383859.flb
[2020/03/03 15:39:01] [debug] [in_tail] file=/var/log/containers/container-name8658d6565b-wvn8w_container-name_container-name-static-4bc24c88e82e6de183e537443aed522039d68e539242b6c250903d06ff0affd4.log read=790 lines=2
[2020/03/03 15:39:08] [debug] [in_tail] file=/var/log/containers/container-name-8658d6565b-wvn8w_container-name_container-name-4bc24c88e82e6de183e537443aed522039d68e539242b6c250903d06ff0affd4.log event

And sometimes :

 [error] [out_kafka] fluent-bit#producer-6: [thrd:ssl://log-kafka1.ru:10091/bootstrap]: ssl://log-kafka1.ru:10091/31: Receive failed: SSL transport error: Connection reset by peer (after 299987ms in state UP)
edsiper commented 4 years ago

Your Kafka Server for some reason is dropping the connections, take a look at Kafka Server logs

On Tue, Mar 3, 2020, 08:25 Nikita S notifications@github.com wrote:

Thanks. I tried fluent/fluent-bit:1.3.9 and can made that my pod started without crashing. But files still doesn't send to kafka and still doesn't remove from worker-node. I found some similar issue on github, but nothing can helped me. In log I have:

[2020/03/03 15:39:01] [debug] [storage] tail.0:1-1583249941.481383859.flb mapped OK [2020/03/03 15:39:01] [debug] [storage] [cio file] synced at: tail.0/1-1583249941.481383859.flb [2020/03/03 15:39:01] [debug] [storage] tail.0:1-1583249941.481383859.flb mapped OK [2020/03/03 15:39:01] [debug] [storage] [cio file] synced at: tail.0/1-1583249941.481383859.flb [2020/03/03 15:39:01] [debug] [in_tail] file=/var/log/containers/container-name8658d6565b-wvn8w_container-name_container-name-static-4bc24c88e82e6de183e537443aed522039d68e539242b6c250903d06ff0affd4.log read=790 lines=2 [2020/03/03 15:39:08] [debug] [in_tail] file=/var/log/containers/container-name-8658d6565b-wvn8w_container-name_container-name-4bc24c88e82e6de183e537443aed522039d68e539242b6c250903d06ff0affd4.log event

And sometimes :

[error] [out_kafka] fluent-bit#producer-6: [thrd:ssl://log-kafka1.ru:10091/bootstrap]: ssl://log-kafka1.ru:10091/31: Receive failed: SSL transport error: Connection reset by peer (after 299987ms in state UP)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/1994?email_source=notifications&email_token=AAC2INXWSHLAKIBJKEN5O6DRFUVN5A5CNFSM4K5RCIX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENUE62A#issuecomment-594038632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC2INTKHEAKZV5H3CURIF3RFUVN5ANCNFSM4K5RCIXQ .

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 5 days with no activity.