fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.88k stars 1.59k forks source link

Kubernetes Events Input Segfault #9543

Open Evesy opened 3 weeks ago

Evesy commented 3 weeks ago

Bug Report

Describe the bug Running the kubernetes_events input eventually results in a seg fault

To Reproduce Cannot yet reliably reproduce, but we see seg faults every few hours with the below config:

[SERVICE]
    Flush                      1
    Grace                      5
    Log_Level                  debug
    Daemon                     off

    HTTP_Server                On
    HTTP_Listen                0.0.0.0
    HTTP_Port                  2020

[FILTER]
    Name   record_modifier
    Alias  add_cloud_metadata
    Match  *
    Record cloud_project_id <redacted>

[FILTER]
    Name           nest
    Operation      nest
    Alias          nest.cloud_data
    Match          kube.*
    Wildcard       cloud_*
    Remove_Prefix  cloud_
    Nest_Under     cloud

[FILTER]
    Name           nest
    Operation      nest
    Alias          nest.meta_data
    Match          kube.*
    Wildcard       cloud
    Nest_Under     meta

[INPUT]
    name            kubernetes_events
    tag             k8s_events
    kube_url        http://app.kubernetes:80

[OUTPUT]
    Name               es
    Match              k8s_events
    Alias              es.k8s_events
    Retry_Limit        5

    Host               ${FLUENT_ELASTICSEARCH_HOST}
    Port               ${FLUENT_ELASTICSEARCH_PORT}
    Compress           gzip

    Logstash_Format    On
    Logstash_Prefix    fluent-kubernetes
    Write_Operation    create
    Buffer_Size        False
    Trace_Error        On
    Generate_ID        On
    Suppress_Type_Name On

Expected behavior Fluent-bit should not crash

Output

[2024/10/30 09:54:15] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 09:54:15] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 09:54:15] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62152862
[2024/10/30 10:47:35] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 10:47:35] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 10:47:35] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62156442
[2024/10/30 11:31:05] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 11:31:05] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 11:31:05] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62158888
[2024/10/30 12:25:19] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 12:25:19] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 12:25:19] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62160843
[2024/10/30 13:00:47] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 13:00:47] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 13:00:47] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62163615
[2024/10/30 13:45:42] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream closed by api server. Reconnect will happen on next interval.
[2024/10/30 13:45:42] [ info] [input:kubernetes_events:kubernetes_events.0] kubernetes stream disconnected, ret=1
[2024/10/30 13:45:43] [ info] [input:kubernetes_events:kubernetes_events.0] Requesting /api/v1/events?watch=1&resourceVersion=62165276
[2024/10/30 14:00:00] [engine] caught signal (SIGSEGV)
#0  0x55f89bd8ee34      in  ???() at ???:0
#1  0x55f89c342326      in  ???() at ???:0
#2  0xffffffffffffffff  in  ???() at ???:0

Your Environment

Additional context We have other fluent bit instances using identical configuration, except other inputs instead of kubernetes_events and we are yet to see any seg faults on those

patrick-stephens commented 3 weeks ago

I notice you're using a Bitnami image - does it happen with the actual image we produce here for OSS?

Evesy commented 3 weeks ago

I will switch to fluent/fluent-bit:3.1.9 and see if it also happens in that image. Will close this off if I don't see any reoccurrence

HaveFun83 commented 3 weeks ago

We saw the same error but we are also on bitnami images

Evesy commented 3 weeks ago

I've been running fluent/fluent-bit:3.1.9 over the weekend and can see it's segfaulted ~5 times. Happy to try and grab more information, whatever would be useful