Example 1
fluent-bit is scheduled on the new node and tries to connect to kubelet where CNI is not ready. During 13 seconds kubelet upstream connection error and '[tls] error: unexpected EOF' logs are generated ~7K times
Example 2
New node is starting, fluent-bit is trying to connect to kubelet where certificate is not issued.
For each connection attempt kubelet is generating error no serving certificate available for the kubelet
Jun 4 04:01:41 ip-A-B-C-D.eu-central-1.compute.internal kernel: process '/fluent-bit/bin/fluent-bit' started with executable stack
Jun 4 04:01:41 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:41.752427 3438 log.go:194] http: TLS handshake error from 127.0.0.1:53014: no serving certificate available for the kubelet
[...]
Jun 4 04:01:42 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:42.983070 3438 csr.go:261] certificate signing request csr-b7bnj is approved, waiting to be issued
[...]
Jun 4 04:01:43 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:43.041195 3438 log.go:194] http: TLS handshake error from 127.0.0.1:58166: no serving certificate available for the kubelet
Jun 4 04:01:43 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:43.096856 3438 csr.go:257] certificate signing request csr-b7bnj is issued
As per the above kubelet logs it takes 2 seconds to approve CSR and issue kubelet certificate. 577 kubelet upstream connection error logs were generated.
Expected behavior
Fluent bit should not so aggressively try to connect to kubelet and generate so many error logs. It should delay the connection for 1 second after unsuccessful attempt to give kubelet and CNI time to become ready.
Your Environment
Version used: v3.0.1, v2.2.2
Configuration: Fluent-bit is configured to use kubelet to get metadata
Environment name and version: EKS v1.26
Additional context
These error logs are forwarded to the logging server and take a lot of space for big and dynamic clusters.
I have this error when I used fluent bit 3.0.7. And i am ok with fluent bit 3.0.4. There is nothing different with my config when I use these two version.
Bug Report
Fluent-bit is configured to use kubelet to get metadata
When new node is started and for some reason kubelet is not ready to start communication, fluent-bit is frequently logging the following error logs:
To Reproduce
Example 1 fluent-bit is scheduled on the new node and tries to connect to kubelet where CNI is not ready. During 13 seconds
kubelet upstream connection error
and '[tls] error: unexpected EOF' logs are generated ~7K timesExample 2 New node is starting, fluent-bit is trying to connect to kubelet where certificate is not issued. For each connection attempt kubelet is generating error
no serving certificate available for the kubelet
As per the above kubelet logs it takes 2 seconds to approve CSR and issue kubelet certificate. 577
kubelet upstream connection error
logs were generated.Expected behavior Fluent bit should not so aggressively try to connect to kubelet and generate so many error logs. It should delay the connection for 1 second after unsuccessful attempt to give kubelet and CNI time to become ready.
Your Environment
Additional context These error logs are forwarded to the logging server and take a lot of space for big and dynamic clusters.