Fluent Bit Binaries while working with Upstream servers are not reliable. If one of the node goes down it is not able to retry the same chunk to other nodes which are live.
The document says that it works in round robin fashion but the chunks are retried to the same dead host. This has caused issues and due to this we had to get load balancer and use forward plugin which helped us to mitigate this error.
[INPUT]
Name tail
path /log/access_log.json
tag acesss-log
Key message
Read_from_Head true
Path_Key log.file.path
DB /var/db/offset
Mem_Buf_Limit 5MB
storage.type filesystem
Buffer_Max_Size 128k
....
[OUTPUT]
Name forward
Match *
Upstream upstream.conf
Retry_Limit False
- Upstream Configuration
[UPSTREAM]
name forward-balancing
[NODE]
name node-1
host node1
port 5043
tls on
tls.verify off
tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem
tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt
tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key
Retry_Limit False
storage.total_limit_size 1G
[NODE]
name node-2
host node2
port 5043
tls on
tls.verify off
tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem
tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt
tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key
Retry_Limit False
storage.total_limit_size 1G
[NODE]
name node-3
host node3
port 5043
tls on
tls.verify off
tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem
tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt
tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key
Retry_Limit False
storage.total_limit_size 1G
- Steps to reproduce the problem:
Use the above config generate some logs using some scripts to the /log/access_log.json and check the data is available in UI
For our use case we are sending data to fluentd hosted in docker with ports opened on hosts and from fluentd we are routing it to Opensearch
Route - Fluent Bit(Linux machine) -> Flunetd(docker) -> Opensearch(docker)
**Expected behavior**
The behaviour has mentioned in the docs - [Fluent Bit Upstream server ](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/upstream-servers) .
The data is sent in round-robin fashion
If the node is down it will retry if Retry_Limit is set to False or no_limits. The retries are happening but to the server which is down and it is not retrying it to the servers which are up
**Screenshots**
<img width="1235" alt="image" src="https://github.com/fluent/fluent-bit/assets/29899440/bcda1117-bc46-4d60-8e08-d54c574c590d">
The above image shows the td-agent-bit goes into loop while not retrying to other node
[fluent-bit.log](https://github.com/fluent/fluent-bit/files/15388493/fluent-bit.log)
Fluent Bit Log File
**Your Environment**
<!--- Include as many relevant details about the environment you experienced the bug in -->
* Version used: 3.0.3
* Configuration: Mentioned above
* Environment name and version (e.g. Kubernetes? What version?): OEL 7.9
* Server type and version: Oracle Linux 7.9
* Operating System and version: Oracle Linux
* Filters and plugins: Basic Input File plugin and Forward Output Plugin mentioned above
**Additional context**
- This issue has affected us by not able to send the data
- The only way we handled is with External Network Load balancer
Fluent-Bit Upstream Server Issue
Fluent Bit Binaries while working with Upstream servers are not reliable. If one of the node goes down it is not able to retry the same chunk to other nodes which are live. The document says that it works in round robin fashion but the chunks are retried to the same dead host. This has caused issues and due to this we had to get load balancer and use forward plugin which helped us to mitigate this error.
To Reproduce
Main Configuration
[INPUT] Name tail path /log/access_log.json tag acesss-log Key message Read_from_Head true Path_Key log.file.path DB /var/db/offset Mem_Buf_Limit 5MB storage.type filesystem Buffer_Max_Size 128k ....
[OUTPUT] Name forward Match * Upstream upstream.conf Retry_Limit False
[UPSTREAM] name forward-balancing
[NODE] name node-1 host node1 port 5043 tls on tls.verify off tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key Retry_Limit False storage.total_limit_size 1G
[NODE] name node-2 host node2 port 5043 tls on tls.verify off tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key Retry_Limit False storage.total_limit_size 1G
[NODE] name node-3 host node3 port 5043 tls on tls.verify off tls.ca_file /etc/td-agent-bit/certs_dev/root-ca.pem tls.crt_file /etc/td-agent-bit/certs_dev/fluent-bit.crt tls.key_file /etc/td-agent-bit/certs_dev/fluent-bit.key Retry_Limit False storage.total_limit_size 1G