fluent / helm-charts

Helm Charts for Fluentd and Fluent Bit
Apache License 2.0
374 stars 447 forks source link

Aggregator Not Sending Logs to outputs After Running for a Few Hours #425

Open Rmaabari opened 1 year ago

Rmaabari commented 1 year ago

Issue Description:

Problem: After deploying the Fluent Bit Aggregator Helm Chart and running it for a few hours, it stops sending logs to Elasticsearch and Syslog, which are the intended destinations for log forwarding.

Expected Behavior: The Fluent Bit Aggregator should consistently and reliably forward logs to the specified Elasticsearch and Syslog destinations as configured in the Helm Chart.

Steps to Reproduce:

Deploy Fluent Bit Aggregator using the provided Helm Chart. Monitor the log forwarding functionality for a few hours. Observe that log forwarding to Elasticsearch and Syslog ceases after a certain period. Actual Results: After an initial period of successful log forwarding, Fluent Bit Aggregator stops sending logs to Elasticsearch and Syslog without any apparent errors or warnings.

Environment Details:

Kubernetes Cluster Version: 1.26 Fluent Bit Agents Version: 2.1.8 Fluent Bit Aggregator Version: 2.1.9 Elasticsearch Version: 8.9

aggregator config:

[SERVICE]
    daemon false
    http_Port 2020
    http_listen 0.0.0.0
    http_server true
    log_level debug
    parsers_file /fluent-bit/etc/parsers.conf
    storage.metrics true
    storage.path /fluent-bit/data

[INPUT]
    name forward
    listen 0.0.0.0
    port 24224

[FILTER]
    Name rewrite_tag
    Match kube.*
    Rule $syslog ^(true)$ syslog.* true
    Emitter_Name re_emitted

[OUTPUT]
    Name syslog
    Match syslog.*
    Host $HOST
    Port 514
    Retry_Limit false
    Mode tcp
    Syslog_Format rfc5424
    Syslog_MaxSize 65536
    Syslog_Hostname_Key hostname
    Syslog_Appname_Key appname
    Syslog_Procid_Key procid
    Syslog_Msgid_Key msgid
    Syslog_SD_Key uls@0
    Syslog_Message_Key msg

[OUTPUT]
    Name es
    Match kube.*
    HTTP_User $USER
    HTTP_Passwd $PASS
    tls Off
    tls.verify Off
    Host elastic-elasticsearch
    Port 9200
    Retry_Limit False
    Trace_Error On
    Trace_Output Off
    Suppress_Type_Name On
    Replace_Dots On
    Buffer_Size False
    Logstash_Prefix logstash
    Logstash_Format On
    Index logstash
    Generate_ID     On
    Write_Operation upsert

[OUTPUT]
    Name es
    Match host.*
    HTTP_User $USER
    HTTP_Passwd $PASS
    tls Off
    tls.verify Off
    Host elastic-elasticsearch
    Port 9200
    Retry_Limit False
    Trace_Error On
    Trace_Output Off
    Suppress_Type_Name On
    Replace_Dots On
    Buffer_Size False
    Logstash_Prefix logstash
    Logstash_Format On
    Index logstash
    Write_Operation upsert
    Generate_ID     On

fluent-bit agents config:

custom_parsers.conf:
----
[PARSER]
    Name docker_no_time
    Format json
    Time_Keep Off
    Time_Key time
    Time_Format %Y-%m-%dT%H:%M:%S.%L

[FILTER]
    Name    grep
    Match   *
    Exclude log liveness

[FILTER]
    Name    grep
    Match   *
    Exclude log readiness

[SERVICE]
    Daemon Off
    Flush 5
    Log_Level debug
    Parsers_File /fluent-bit/etc/parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port 2020
    Health_Check On

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    Exclude_Path      /var/log/containers/*_monitoring_*.log
    multiline.parser docker, cri
    Tag kube.*
    Mem_Buf_Limit 50MB
    Buffer_Max_Size 1MB
    Skip_Long_Lines Off

[INPUT]
    Name systemd
    Tag host.*
    Systemd_Filter _SYSTEMD_UNIT=kubelet.service
    Read_From_Tail On

[FILTER]
    Name kubernetes
    Match kube.*
    Merge_Log On
    Keep_Log Off
    K8S-Logging.Parser On
    K8S-Logging.Exclude On

[OUTPUT]
    Name    forward
    Match   *
    Host    fluent-bit-aggregator
    Port    24224

fluent-bit aggregator statefulset logs while no logs is sent to elastic

[2023/09/17 12:07:58] [debug] [out flush] cb_destroy coro_id=7942
[2023/09/17 12:07:58] [debug] [retry] re-using retry for task_id=1959 attempts=19
[2023/09/17 12:07:58] [ warn] [engine] failed to flush chunk '1-1694939682.183824748.flb', retry in 1069 seconds: task_id=1959, input=forward.0 > output=es.1 (out_id=1)
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=1354 assigned to thread #1
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=1642 assigned to thread #0
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=685 assigned to thread #1
[2023/09/17 12:07:59] [debug] [upstream] KA connection #96 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [upstream] KA connection #91 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header
[2023/09/17 12:07:59] [debug] [upstream] KA connection #89 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header

Kibana view of the logs image


Rmaabari commented 1 year ago

Attaching a link of the issue raised in @stevehipwell helm git repo https://github.com/stevehipwell/helm-charts/issues/789