fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.56k stars 1.52k forks source link

Frequent 'kubelet upstream connection errors' during startup #8913

Open rtalipov opened 1 month ago

rtalipov commented 1 month ago

Bug Report

Fluent-bit is configured to use kubelet to get metadata

When new node is started and for some reason kubelet is not ready to start communication, fluent-bit is frequently logging the following error logs:

[error] [tls] error: unexpected EOF
[error] [filter:kubernetes:kubernetes.1] kubelet upstream connection error

To Reproduce

Example 1 fluent-bit is scheduled on the new node and tries to connect to kubelet where CNI is not ready. During 13 seconds kubelet upstream connection error and '[tls] error: unexpected EOF' logs are generated ~7K times

[2024/05/24 07:34:21] [error] [tls] error: unexpected EOF
[2024/05/24 07:34:21] [error] [filter:kubernetes:kubernetes.1] kubelet upstream connection error
[...]
[2024/05/24 07:34:33] [error] [filter:kubernetes:kubernetes.1] kubelet upstream connection error
[2024/05/24 07:34:34] [error] [tls] error: unexpected EOF 

Example 2 New node is starting, fluent-bit is trying to connect to kubelet where certificate is not issued. For each connection attempt kubelet is generating error no serving certificate available for the kubelet

Jun  4 04:01:41 ip-A-B-C-D.eu-central-1.compute.internal kernel: process '/fluent-bit/bin/fluent-bit' started with executable stack
Jun  4 04:01:41 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:41.752427    3438 log.go:194] http: TLS handshake error from 127.0.0.1:53014: no serving certificate available for the kubelet
[...]
Jun  4 04:01:42 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:42.983070    3438 csr.go:261] certificate signing request csr-b7bnj is approved, waiting to be issued
[...]
Jun  4 04:01:43 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:43.041195    3438 log.go:194] http: TLS handshake error from 127.0.0.1:58166: no serving certificate available for the kubelet
Jun  4 04:01:43 ip-A-B-C-D.eu-central-1.compute.internal kubelet: I0604 04:01:43.096856    3438 csr.go:257] certificate signing request csr-b7bnj is issued

As per the above kubelet logs it takes 2 seconds to approve CSR and issue kubelet certificate. 577 kubelet upstream connection error logs were generated.

Expected behavior Fluent bit should not so aggressively try to connect to kubelet and generate so many error logs. It should delay the connection for 1 second after unsuccessful attempt to give kubelet and CNI time to become ready.

Your Environment

Additional context These error logs are forwarded to the logging server and take a lot of space for big and dynamic clusters.

pallasathena92 commented 3 weeks ago

I have this error when I used fluent bit 3.0.7. And i am ok with fluent bit 3.0.4. There is nothing different with my config when I use these two version.