Closed hoperays closed 2 years ago
the same issue after we redeployed fluentd in k8s cluster
the same issue after we redeployed fluentd in k8s cluster...
I found an interesting situation.
If use kubectl delete pod fluentd-pod
, fluent-bit sometime will stuck and lost connection with fluentd, even if fluentd resumed.
But if use kubectl rollout restart deploy fluentd
, the problem will not happen.
Hey, I am facing the same problem. I am dealing over some critical data. Has anyone found a workaround for this situation?
Hi Got the same problem when one of the Load Balancer's Host is unavailable temporarily, I'm trying to adjust retry_limit to see if it can be resolved.
does your Fluentd has a working service listening port 24224?
I run into the same problem. Flunt-bit and Fluend are running on the EC2 instances. Fluent-bit couldn't be recovered after fleuntd became temporarily unavailable.
the same problem on fluentbit 1.7.2, fluentbit and fleuntbit are deployed on kubernetes, fluentbit use headless-service
forward logs to fluentd. This problem is very frequent.
[2021/04/02 08:50:29] [error] [net] TCP connection failed: cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240 (Connection timed out)
[2021/04/02 08:50:29] [error] [net] cannot connect to cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240
[2021/04/02 08:50:29] [error] [output:forward:forward.0] no upstream connections available
[2021/04/02 08:50:29] [ warn] [engine] failed to flush chunk '1-1617353299.120165002.flb', retry in 8 seconds: task_id=0, input=tail.0 > output=forward.0 (out_id=0)
[2021/04/02 08:51:04] [error] [net] TCP connection failed: cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240 (Connection timed out)
[2021/04/02 08:51:04] [error] [net] cannot connect to cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240
[2021/04/02 08:51:04] [error] [output:forward:forward.0] no upstream connections available
[2021/04/02 08:51:04] [ warn] [engine] failed to flush chunk '1-1617353334.496987689.flb', retry in 6 seconds: task_id=1, input=tail.0 > output=forward.0 (out_id=0)
[2021/04/02 08:51:44] [error] [net] TCP connection failed: cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240 (Connection timed out)
[2021/04/02 08:51:44] [error] [net] cannot connect to cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240
[2021/04/02 08:51:44] [error] [output:forward:forward.0] no upstream connections available
[2021/04/02 08:51:44] [ warn] [engine] failed to flush chunk '1-1617353374.267709871.flb', retry in 8 seconds: task_id=2, input=tail.0 > output=forward.0 (out_id=0)
[2021/04/02 08:52:44] [error] [net] TCP connection failed: cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240 (Connection timed out)
[2021/04/02 08:52:44] [error] [net] cannot connect to cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240
[2021/04/02 08:52:44] [error] [output:forward:forward.0] no upstream connections available
[2021/04/02 08:52:44] [ warn] [engine] failed to flush chunk '1-1617353434.367870814.flb', retry in 10 seconds: task_id=3, input=tail.0 > output=forward.0 (out_id=0)
[2021/04/02 08:53:15] [error] [net] TCP connection failed: cluster-fluentd-5.cluster-fluentd-headless.logging.svc.cluster.local:24240 (Connection timed out)
Same happens to me, only way to solve for now is restart fluent-bit
Same issue.
Same issue
If anyone here has a reliable reproduction and is able to perform some tests with me contact me in the fluent slack and we'll found out the root of the issue.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
Hi everyone here, We've released a couple of fixes that handle connection loss and timeout scenarios in 1.8.15 and 1.9.1. I'm closing this issue now, but if you still see the problem, feel free to reopen it or open a new one. We'll gladly assist you further once you provide a repro scenario.
Bug Report
Describe the bug Fluent-bit stuck when it lost connection with fluentd, and still did not respond after fluentd resumed.
To Reproduce Temporarily no method of reproduction.
Expected behavior Fluent-bit does not stuck.
Screenshots The log of fluent-bit stopped in 2019/01/05 21:57:41.
The stack status of fluent-bit process as follow:
Your Environment
[INPUT] Buffer_Max_Size 2MB DB /var/log/flb_kube.db DB.Sync OFF Mem_Buf_Limit 5MB Name tail Parser docker Path /var/log/containers/.log Tag ${HOSTNAME}.kube.
[FILTER] Match ${HOSTNAME}.kube.* Merge_JSON_Log true Name kubernetes
[FILTER] Match * Name record_modifier Record hostname ${HOSTNAME}
[OUTPUT] Host ${FLUENTD_HOST} Match * Name forward Port ${FLUENTD_PORT} Retry_Limit False