fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.74k stars 1.56k forks source link

Kubernetes Metadata is not working with Fluent Bit 3.0.0 #8642

Closed dan-leanix closed 5 months ago

dan-leanix commented 5 months ago

Bug Report

Describe the bug Today, we started the rollout of Fluent Bit version 3.0.0 on our test environment. The version before was 2.2.2. Since then the Kubernetes metadata information is not added any more to the respective log message entry. For details see below.

With Fluent Bit 3.0.0 and the changes introduced to the Kubernetes filter with https://github.com/fluent/fluent-bit/pull/8279 it seems that the Kubernetes filter plugin does not call the Kubelet for information anymore.

When you look into the log output from Fluent Bit 2.2.2 you see that Fluent Bit 2.2.2 calls the Kubelet first for itself and then again when the first log line is logged to get the Kubernetes metadata information for the pod. This last call is missing in Fluent Bit 3.0.0 as seen in the log output below.

To Reproduce Use Fluent Bit version 3.0.0 and the following config to get container logs from Kubernetes with the Kubernetes metadata information.

    [INPUT]
        Name              tail
        Alias             logs_input
        Tag               kubernetes.logs.*
        Path              ${FLUENT_BIT_LOG_PATH}
        Parser            cri_kubernetes_logs
        DB                /var/log/flb_kubernetes_log.db
        DB.sync           normal
        Read_from_Head    true
        Mem_Buf_Limit     10mb
        storage.type      filesystem
        Buffer_Chunk_Size 2mb
        Buffer_Max_Size   2mb
        Rotate_Wait       20
        Refresh_Interval  30
        Skip_Long_Lines   On
        Ignore_Older      5m
        Exclude_Path      ${FLUENT_BIT_EXCLUDE_LOG_PATH}
        Log_Level         warning

    [FILTER]
        Name                kubernetes
        Alias               logs_filter_1
        Match               kubernetes.logs.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kubernetes.logs.var.log.containers.
        Merge_Log           On
        Merge_Log_Trim      On
        K8S-Logging.Parser  Off
        K8S-Logging.Exclude On
        Annotations         Off
        Labels              Off
        Use_Kubelet         On
        kubelet_host        ${NODE_IP}
        Kubelet_Port        10250
        tls.verify          Off
        tls.debug           4
        Buffer_Size         0
        log_level           debug

    [OUTPUT]
        Name           stdout
        Alias          logs_output
        Match          kubernetes.logs.*
        json_date_format    iso8601
        Format              json_lines

__. __. ____ _ ____/| | ___ / | _____ _|/ | ___ \ | ) | | | | _/ \ / \ \ | | _/ \ _\ \ \/ / ( < | \ | |_| | /\ /| | \ | | | \ || | \ / / \ \ / |// _ >| /| |____ /||| \/ /__ / \/ \/ \/ \/ \/

[2024/03/27 11:37:39] [ info] [fluent bit] version=3.0.0, commit=f499a4fbe1, pid=1 [2024/03/27 11:37:39] [ info] [storage] ver=1.5.1, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128 [2024/03/27 11:37:39] [ info] [storage] backlog input plugin: storage_backlog.3 [2024/03/27 11:37:39] [ info] [cmetrics] version=0.7.0 [2024/03/27 11:37:39] [ info] [ctraces ] version=0.4.0 [2024/03/27 11:37:39] [ info] [input:storage_backlog:storage_backlog.3] initializing [2024/03/27 11:37:39] [ info] [input:storage_backlog:storage_backlog.3] storage_strategy='memory' (memory only) [2024/03/27 11:37:39] [ info] [input:storage_backlog:storage_backlog.3] queue memory limit: 47.7M [2024/03/27 11:37:39] [ info] [filter:kubernetes:logs_filter_1] https=1 host=kubernetes.default.svc port=443 [2024/03/27 11:37:39] [ info] [filter:kubernetes:logs_filter_1] token updated [2024/03/27 11:37:39] [ info] [filter:kubernetes:logs_filter_1] local POD info OK [2024/03/27 11:37:44] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet... [2024/03/27 11:37:44] [debug] [filter:kubernetes:logs_filter_1] Send out request to Kubelet for pods information. [2024/03/27 11:37:44] [debug] [filter:kubernetes:logs_filter_1] Request (ns=logging, pod=fluent-bit-jk28w) http_do=0, HTTP Status: 200 [2024/03/27 11:37:44] [ info] [filter:kubernetes:logs_filter_1] connectivity OK [2024/03/27 11:37:44] [ info] [filter:kubernetes:events_filter_1] https=1 host=kubernetes.default.svc port=443 [2024/03/27 11:37:44] [ info] [filter:kubernetes:events_filter_1] token updated [2024/03/27 11:37:44] [ info] [filter:kubernetes:events_filter_1] local POD info OK [2024/03/27 11:37:44] [ info] [filter:kubernetes:events_filter_1] testing connectivity with Kubelet... [2024/03/27 11:37:44] [ info] [filter:kubernetes:events_filter_1] connectivity OK [2024/03/27 11:37:44] [ info] [output:stdout:logs_output] worker #0 started [2024/03/27 11:37:44] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [2024/03/27 11:37:44] [ info] [sp] stream processor started {"date":"2024-03-27T11:37:39.620740Z","time":"2024-03-27T11:37:39.620740586Z","stream":"stdout","logtag":"F","message":"{\"request_id\":\"\",\"protocol\":\"HTTP/1.1\",\"downstream_remote_address\":\":38112\",\"route_name\":\"pathfinder-web\",\"method\":\"GET\",\"bytes_sent\":\"26\",\"authority\":\"staging.leanix.net\",\"duration\":\"23\",\"upstream_transport_failure_reason\":\"-\",\"bytes_received\":\"0\",\"instana_span_id\":\"\",\"x_envoy_external_address\":\"\",\"requested_server_name\":\"istio-westeurope-test.leanix.net\",\"response_flags\":\"-\",\"start_time\":\"2024-03-27T11:37:38.869Z\",\"response_code\":\"200\",\"upstream_host\":\"10.9.2.253:8080\",\"user_agent\":\"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; AppInsights)\",\"path\":\"/version.json\",\"server_timing\":\"-\",\"upstream_cluster\":\"outbound|8080||app.froxy.svc.cluster.local\",\"upstream_local_address\":\"10.9.18.155:54442\",\"strict_transport_security\":\"-\",\"downstream_local_address\":\"10.9.18.155:8443\",\"request_header_bytes\":\"5197\",\"x_forwarded_for\":\"\",\"response_duration\":\"23\",\"x_xss_protection\":\"-\",\"instana_trace_id\":\"\",\"istio_policy_status\":\"-\",\"upstream_service_time\":\"23\"}"}


**Expected behavior**

With Fluent Bit 3.0.0 and the changes introduced to the Kubernetes filter with https://github.com/fluent/fluent-bit/pull/8279 it seems that the Kubernetes filter plugin does not call the Kubelet for information anymore.

When you look into the log output from Fluent Bit 2.2.2 you see that Fluent Bit 2.2.2 calls the Kubelet first for itself and then again when the first log line is logged to get the Kubernetes metadata information for the pod. This last call is missing in Fluent Bit 3.0.0 as seen in the above log output.

- Example log message if applicable:

Fluent Bit v2.2.2


< Fluent Bit v2.2.2 >

      \
       \
        \          __---__
                _-       /--______
           __--( /     \ )XXXXXXXXXXX\v.
         .-XXX(   O   O  )XXXXXXXXXXXXXXX-
        /XXX(       U     )        XXXXXXX\
      /XXXXX(              )--_  XXXXXXXXXXX\
     /XXXXX/ (      O     )   XXXXXX   \XXXXX\
     XXXXX/   /            XXXXXX   \__ \XXXXX
     XXXXXX__/          XXXXXX         \__---->

---_ XXX/ XXXXXX \ / - --_/ /\ XXXXXX / --/= -\ / XXXXXX '--- XXXXXX -\/XXX\ XXXXXX /XXXXX \XXXXXXXXX \ /XXXXX/ \XXXXXX > _/XXXXX/ \XXXXX--/ -- XXXX/ -XXXXXXXX--------------- XXXXXX- \XXXXXXXXXXXXXXXXXXXXXXXXXX/ ""VXXXXXXXXXXXXXXXXXXV""

[2024/03/27 11:40:11] [ info] [fluent bit] version=2.2.2, commit=eeea396e88, pid=1 [2024/03/27 11:40:11] [ info] [storage] ver=1.5.1, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128 [2024/03/27 11:40:11] [ info] [storage] backlog input plugin: storage_backlog.3 [2024/03/27 11:40:11] [ info] [cmetrics] version=0.6.6 [2024/03/27 11:40:11] [ info] [ctraces ] version=0.4.0 [2024/03/27 11:40:11] [ info] [input:storage_backlog:storage_backlog.3] initializing [2024/03/27 11:40:11] [ info] [input:storage_backlog:storage_backlog.3] storage_strategy='memory' (memory only) [2024/03/27 11:40:11] [ info] [input:storage_backlog:storage_backlog.3] queue memory limit: 47.7M [2024/03/27 11:40:11] [ info] [filter:kubernetes:logs_filter_1] https=1 host=10.9.16.197 port=10250 [2024/03/27 11:40:11] [ info] [filter:kubernetes:logs_filter_1] token updated [2024/03/27 11:40:11] [ info] [filter:kubernetes:logs_filter_1] local POD info OK [2024/03/27 11:40:11] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet... [2024/03/27 11:40:11] [debug] [filter:kubernetes:logs_filter_1] Send out request to Kubelet for pods information. [2024/03/27 11:40:12] [debug] [filter:kubernetes:logs_filter_1] Request (ns=logging, pod=fluent-bit-bprxm) http_do=0, HTTP Status: 200 [2024/03/27 11:40:12] [ info] [filter:kubernetes:logs_filter_1] connectivity OK [2024/03/27 11:40:12] [ info] [filter:kubernetes:events_filter_1] https=1 host=10.9.16.197 port=10250 [2024/03/27 11:40:12] [ info] [filter:kubernetes:events_filter_1] token updated [2024/03/27 11:40:12] [ info] [filter:kubernetes:events_filter_1] local POD info OK [2024/03/27 11:40:12] [ info] [filter:kubernetes:events_filter_1] testing connectivity with Kubelet... [2024/03/27 11:40:12] [ info] [filter:kubernetes:events_filter_1] connectivity OK [2024/03/27 11:40:12] [ info] [output:stdout:logs_output] worker #0 started [2024/03/27 11:40:12] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [2024/03/27 11:40:12] [ info] [sp] stream processor started [2024/03/27 11:40:12] [debug] [filter:kubernetes:logs_filter_1] Send out request to Kubelet for pods information. [2024/03/27 11:40:12] [debug] [filter:kubernetes:logs_filter_1] Request (ns=istio-system, pod=istio-ingressgateway-76996f8fd-9rgcv) http_do=0, HTTP Status: 200 [2024/03/27 11:40:12] [debug] [filter:kubernetes:logs_filter_1] kubelet find pod: istio-ingressgateway-76996f8fd-9rgcv and ns: istio-system match {"date":"2024-03-27T11:40:12.232931Z","time":"2024-03-27T11:40:12.232931021Z","stream":"stdout","logtag":"F","message":"{\"upstream_host\":\"10.9.15.34:8200\",\"downstream_remote_address\":\"10.9.20.133:43072\",\"istio_policy_status\":\"-\",\"authority\":\"hcvault-eu-test.leanix.net:443\",\"upstream_transport_failure_reason\":\"-\",\"upstream_service_time\":\"5\",\"instana_span_id\":\"-\",\"downstream_local_address\":\"10.9.18.155:8443\",\"start_time\":\"2024-03-27T11:40:11.781Z\",\"bytes_received\":\"0\",\"server_timing\":\"-\",\"bytes_sent\":\"135869\",\"x_envoy_external_address\":\"-\",\"requested_server_name\":\"hcvault-eu-test.leanix.net\",\"method\":\"GET\",\"user_agent\":\"GrafanaAgent/v0.35.4\",\"response_flags\":\"-\",\"instana_trace_id\":\"-\",\"duration\":\"6\",\"route_name\":\"vault\",\"strict_transport_security\":\"max-age=31536000; includeSubDomains\",\"path\":\"/v1/sys/metrics?format=prometheus\",\"request_header_bytes\":\"3019\",\"response_duration\":\"5\",\"upstream_local_address\":\"10.9.18.155:47884\",\"x_forwarded_for\":\"10.9.20.133\",\"response_code\":\"200\",\"protocol\":\"HTTP/2\",\"upstream_cluster\":\"outbound|8200||vault-active.vault.svc.cluster.local\",\"request_id\":\"REDACTED>\",\"x_xss_protection\":\"-\"}","kubernetes":{"pod_name":"istio-ingressgateway-76996f8fd-9rgcv","namespace_name":"istio-system","pod_id":"5d7d3604-7957-4fb0-97c0-32e00137e210","host":"aks-defaultblue-34851620-vmss0000x2","container_name":"istio-proxy","docker_id":"ffd35aee6dab40508119ef1e8516518212c34519338893bc6b6e745c8c4baac1","container_hash":"<REDACTED>/istio/proxyv2@sha256:7b71864c9c3520c6ebe901f1c3adac71171af723fcd560279241fdbf787fea08","container_image":"<REDACTED/istio/proxyv2:1.20.3-distroless"}}



**Screenshots**
<!--- If applicable, add screenshots to help explain your problem. -->

**Your Environment**
<!--- Include as many relevant details about the environment you experienced the bug in -->
* Version used: 3.0.0
* Configuration: See above for the Kubernetes filter part
* Environment name and version (e.g. Kubernetes? What version?): Kubernetes (Azure Kubernetes Service) 1.28.5
* Server type and version:
* Operating System and version:
* Filters and plugins: Tail, Kubernetes, stdout

**Additional context**
<!--- How has this issue affected you? What are you trying to accomplish? -->
<!--- Providing context helps us come up with a solution that is most useful in the real world -->
dan-leanix commented 5 months ago

I guess I found the behavior change. Once Labels is set to On then the collection of the metadata works. That is a breaking change in version 3.0.0 to the behavior in version 2.2.2.

    [FILTER]
        Name                kubernetes
        Alias               logs_filter_1
        Match               kubernetes.logs.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kubernetes.logs.var.log.containers.
        Merge_Log           On
        Merge_Log_Trim      On
        K8S-Logging.Parser  Off
        K8S-Logging.Exclude On
        Annotations         Off
        Labels              Off <--------------- Must be set to On with version 3.0.0 to gather Kubernetes Metadata.
        Use_Kubelet         On
        kubelet_host        ${NODE_IP}
        Kubelet_Port        10250
        tls.verify          Off
        tls.debug           4
        Buffer_Size         0
        log_level           debug

With Labels set to On Fluent Bit talks to the Kubelet again.

Fluent Bit v3.0.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  <
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/

[2024/03/28 07:48:05] [ info] [fluent bit] version=3.0.0, commit=f499a4fbe1, pid=1
[2024/03/28 07:48:05] [ info] [storage] ver=1.5.1, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
[2024/03/28 07:48:05] [ info] [storage] backlog input plugin: storage_backlog.3
[2024/03/28 07:48:05] [ info] [cmetrics] version=0.7.0
[2024/03/28 07:48:05] [ info] [ctraces ] version=0.4.0
[2024/03/28 07:48:05] [ info] [input:storage_backlog:storage_backlog.3] initializing
[2024/03/28 07:48:05] [ info] [input:storage_backlog:storage_backlog.3] storage_strategy='memory' (memory only)
[2024/03/28 07:48:05] [ info] [input:storage_backlog:storage_backlog.3] queue memory limit: 47.7M
[2024/03/28 07:48:05] [ info] [filter:kubernetes:logs_filter_1] https=1 host=kubernetes.default.svc port=443
[2024/03/28 07:48:05] [ info] [filter:kubernetes:logs_filter_1]  token updated
[2024/03/28 07:48:05] [ info] [filter:kubernetes:logs_filter_1] local POD info OK
[2024/03/28 07:48:10] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet...
[2024/03/28 07:48:10] [debug] [filter:kubernetes:logs_filter_1] Send out request to Kubelet for pods information.
[2024/03/28 07:48:10] [debug] [filter:kubernetes:logs_filter_1] Request (ns=logging, pod=fluent-bit-h55xh) http_do=0, HTTP Status: 200
[2024/03/28 07:48:10] [ info] [filter:kubernetes:logs_filter_1] connectivity OK
[2024/03/28 07:48:10] [ info] [filter:kubernetes:events_filter_1] https=1 host=kubernetes.default.svc port=443
[2024/03/28 07:48:10] [ info] [filter:kubernetes:events_filter_1]  token updated
[2024/03/28 07:48:10] [ info] [filter:kubernetes:events_filter_1] local POD info OK
[2024/03/28 07:48:10] [ info] [filter:kubernetes:events_filter_1] testing connectivity with Kubelet...
[2024/03/28 07:48:10] [ info] [filter:kubernetes:events_filter_1] connectivity OK
[2024/03/28 07:48:10] [ info] [output:stdout:logs_output] worker #0 started
[2024/03/28 07:48:10] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2024/03/28 07:48:10] [ info] [sp] stream processor started
[2024/03/28 07:48:42] [debug] [filter:kubernetes:logs_filter_1] Send out request to Kubelet for pods information.
[2024/03/28 07:48:42] [debug] [filter:kubernetes:logs_filter_1] Request (ns=istio-system, pod=istio-ingressgateway-76996f8fd-9rgcv) http_do=0, HTTP Status: 200
[2024/03/28 07:48:42] [debug] [filter:kubernetes:logs_filter_1] kubelet find pod: istio-ingressgateway-76996f8fd-9rgcv and ns: istio-system match
{"date":"2024-03-28T07:48:42.113257Z","time":"2024-03-28T07:48:42.113257389Z","stream":"stdout","logtag":"F","message":"{\"response_code\":\"200\",\"response_flags\":\"-\",\"duration\":\"11\",\"x_xss_protection\":\"-\",\"downstream_remote_address\":\"<REDACTED>:14526\",\"path\":\"/services/mtm/v1/instances/<REDACTED>/domains\",\"route_name\":\"mtm-dev\",\"istio_policy_status\":\"-\",\"x_envoy_external_address\":\"<REDACTED>\",\"server_timing\":\"-\",\"instana_span_id\":\"<REDACTED>\",\"bytes_received\":\"0\",\"request_id\":\"<REDACTED>\",\"upstream_cluster\":\"outbound|9510||mtm.mtm-dev.svc.cluster.local\",\"user_agent\":\"Swagger-Codegen/1.0.0/java\",\"downstream_local_address\":\"10.9.18.155:8443\",\"upstream_host\":\"10.9.1.0:9510\",\"response_duration\":\"11\",\"strict_transport_security\":\"-\",\"authority\":\"istio-westeurope-test-mtm-dev.leanix.net\",\"start_time\":\"2024-03-28T07:48:42.095Z\",\"request_header_bytes\":\"4177\",\"requested_server_name\":\"istio-westeurope-test-mtm-dev.leanix.net\",\"protocol\":\"HTTP/1.1\",\"upstream_transport_failure_reason\":\"-\",\"method\":\"GET\",\"bytes_sent\":\"521\",\"upstream_service_time\":\"11\",\"upstream_local_address\":\"10.9.18.155:58396\",\"instana_trace_id\":\"<REDACTED>\",\"x_forwarded_for\":\"<REDACTED>\"}","kubernetes":{"pod_name":"istio-ingressgateway-76996f8fd-9rgcv","namespace_name":"istio-system","pod_id":"5d7d3604-7957-4fb0-97c0-32e00137e210","labels":{"app":"istio-ingressgateway","chart":"gateways","heritage":"Tiller","install.operator.istio.io/owning-resource":"unknown","istio":"ingressgateway","istio.io/rev":"default","operator.istio.io/component":"IngressGateways","pod-template-hash":"76996f8fd","release":"istio","service.istio.io/canonical-name":"istio-ingressgateway","service.istio.io/canonical-revision":"latest","sidecar.istio.io/inject":"false"},"host":"aks-defaultblue-34851620-vmss0000x2","container_name":"istio-proxy","docker_id":"ffd35aee6dab40508119ef1e8516518212c34519338893bc6b6e745c8c4baac1","container_hash":"<REDACTED>/istio/proxyv2@sha256:7b71864c9c3520c6ebe901f1c3adac71171af723fcd560279241fdbf787fea08","container_image":"<REDACTED>/istio/proxyv2:1.20.3-distroless"}}
ScarletTanager commented 5 months ago

@dan-leanix did you have Labels explicitly set to Off before? Because On is the default value, I believe.

dan-leanix commented 5 months ago

@ScarletTanager Yes, we set Labels and Annotations explicitly to Off in our configuration.

edsiper commented 5 months ago

@ryanohnemus can you please take a look at this issue ? it seems its a regression

ryanohnemus commented 5 months ago

@edsiper @dan-leanix - this was an intentional change I made when adding the namespace filters as I was expecting cases where someone would only want to fetch namespace metadata information and not be required to fetch pod information. I am using this in setups to pass a kubernetes_event input through the filter, and retrieve namespace labels even though there would not be a matching pod for certain k8s events.

To fix this breaking change, this if statement here would need to be replaced with a new option to skip fetching pod metadata when enabled. I wasn't careful enough originally and assumed if labels and annotations were both disabled that no pod metadata was being requested.

I can start working on the fix but it might be a few days before I get the PR out there.

dan-leanix commented 5 months ago

Thanks @ryanohnemus for the explanation. 🙂

patrick-stephens commented 5 months ago

We probably need to have better CI testing for this as well to prevent future regression, all for implementing a fix but prevention of future failures is key too.