Open dashe-ops opened 1 year ago
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days
Hi @dashe-ops,
I assume the issue is caused by your Fluent-Bit configuration. You can set the Require_ack_response
option in the forward output (https://docs.fluentbit.io/manual/pipeline/outputs/forward) for improved reliability. In addition, you should make sure that Fluent-Bit doesn't drop log messages because the maximum number of retries was reached. The configuration of retries is described here https://docs.fluentbit.io/manual/administration/scheduling-and-retries#configuring-retries.
Describe the bug
Hi,
a log ship solution, we are using fluent-bit on client VM's and sending logs to a 3 node fluentd cluster.
if I stop all 3 nodes in fluentd cluster at same time say for example for 2 minutes then restart fluentd on all 3 nodes. When i check the shipped logs we are missing 60 seconds of logs from the 2 minutes offline period.
To Reproduce
writing a simple log to test, print the date every 1 second:
while sleep 1; do date; done > /tmp/test.log
on fluentd cluster stop the cluster (stop all 3 nodes at same time)
untar the test logfile and see timestamp of last log line
wait 2 minutes and restart the fluentd cluster
wait for new log from client to appear and untar and read first few lines.
if buffers worked as we expect there should be no lost data
everytime there is lost data
tail -5 ie1-abc01b-nxt.nxt.test-test_20230323_02a.log Thu Mar 23 15:56:51 UTC 2023 Thu Mar 23 15:56:52 UTC 2023 Thu Mar 23 15:56:53 UTC 2023 Thu Mar 23 15:56:54 UTC 2023 Thu Mar 23 15:56:55 UTC 2023
head ie1-abc01b-nxt.nxt.test-test_20230323_03a.log
Thu Mar 23 15:57:57 UTC 2023 Thu Mar 23 15:57:58 UTC 2023 Thu Mar 23 15:57:59 UTC 2023 Thu Mar 23 15:58:00 UTC 2023 Thu Mar 23 15:58:01 UTC 2023
in above example we've lost 1 minutes data
Expected behavior
if buffers worked as we expect there should be no lost data
Your Environment
Your Configuration
Your Error Log
Additional context
No response