fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.65k stars 1.54k forks source link

Fluentbit with elasticsearch plugin periodically OOM in the Kubernetes environment #2671

Closed rumanzo closed 3 years ago

rumanzo commented 3 years ago

We have kubernetes installation with fluentbit with elasticsearch plugin Daemon Set limits:

...
        resources:
          limits:
            memory: "5Gi"
...

input-kubernetes.conf

[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            docker
    DB                /var/log/flb_kube.db
    DB.Sync           Off
    Mem_Buf_Limit     512M
    Skip_Long_Lines   On
    Refresh_Interval  10
    Ignore_Older      6h

output-elasticsearch.conf

[OUTPUT]
    Name            es
    Match           kube.*
    Host            ${FLUENT_ELASTICSEARCH_HOST}
    Port            ${FLUENT_ELASTICSEARCH_PORT}
    Index           logstash
    Type            fluentd
    Logstash_Format On
    Replace_Dots    On
    Retry_Limit     5

filter-kubernetes.conf

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     kube.var.log.containers.
    Merge_Log           Off
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On

1-5 times per day fluentbit pod on some kubernetes nodes killed by oom. In previous container log I can see only

[2020/10/08 14:59:13] [ warn] [msgpack2json] unknown msgpack type 1647469090

all of 10000 row's (log_level debug), and nothing else documentation (https://docs.fluentbit.io/manual/administration/memory-management) says that my memory consumption should be 512mb 3 1.2 = 1843mb

Fluentbit containers base on CentOS Linux release 8.2.2004 Fluentbit rpm spec:

Name:           fluent-bit
Version:        1.5.7
...
cmake3 \
-DFLB_JEMALLOC=Yes \
-DFLB_TLS=Yes \
-DFLB_STREAM_PROCESSOR=No \
-DFLB_EXAMPLES=No \
-DFLB_SIGNV4=No \
-DFLB_PROXY_GO=No \
-DFLB_RECORD_ACCESSOR=Off \
-DFLB_AWS=No \
-DFLB_DEV=No \
-DFLB_METRICS=Yes \
-DFLB_HTTP_SERVER=Yes \
-DFLB_INOTIFY=Yes \
-DFLB_IN_COLLECTD=No \
-DFLB_IN_CPU=No \
-DFLB_IN_DISK=No \
-DFLB_IN_DOCKER=No \
-DFLB_IN_DOCKER_EVENTS=No \
-DFLB_IN_DUMMY=No \
-DFLB_IN_EMITTER=No \
-DFLB_IN_EXEC=No \
-DFLB_IN_HEAD=No \
-DFLB_IN_HEALTH=Yes \
-DFLB_IN_HTTP=No \
-DFLB_IN_KMSG=No \
-DFLB_IN_LIB=No \
-DFLB_IN_MQTT=No \
-DFLB_IN_NETIF=No \
-DFLB_IN_PROC=No \
-DFLB_IN_RANDOM=No \
-DFLB_IN_SERIAL=No \
-DFLB_IN_STATSD=No \
-DFLB_IN_STDIN=No \
-DFLB_IN_STORAGE_BACKLOG=No \
-DFLB_IN_SYSTEMD=No \
-DFLB_IN_TAIL=Yes \
-DFLB_IN_TCP=No \
-DFLB_IN_THERMAL=No \
-DFLB_IN_WINLOG=No \
-DFLB_OUT_ES=Yes \
-DFLB_OUT_FORWARD=Yes \
-DFLB_OUT_AZURE=No \
-DFLB_OUT_BIGQUERY=No \
-DFLB_OUT_COUNTER=No \
-DFLB_OUT_DATADOG=No \
-DFLB_OUT_EXIT=No \
-DFLB_OUT_GELF=No \
-DFLB_OUT_HTTP=No \
-DFLB_OUT_INFLUXDB=No \
-DFLB_OUT_NATS=No \
-DFLB_OUT_NRLOGS=No \
-DFLB_OUT_TCP=No \
-DFLB_OUT_PLOT=No \
-DFLB_OUT_FILE=No \
-DFLB_OUT_TD=No \
-DFLB_OUT_RETRY=No \
-DFLB_OUT_PGSQL=No \
-DFLB_OUT_SLACK=No \
-DFLB_OUT_SPLUNK=No \
-DFLB_OUT_STACKDRIVER=No \
-DFLB_OUT_STDOUT=No \
-DFLB_OUT_SYSLOG=No \
-DFLB_OUT_LIB=No \
-DFLB_OUT_NULL=No \
-DFLB_OUT_FLOWCOUNTER=No \
-DFLB_OUT_LOGDNA=No \
-DFLB_OUT_KAFKA=No \
-DFLB_OUT_KAFKA_REST=No \
-DFLB_OUT_CLOUDWATCH_LOGS=No \
-DFLB_FILTER_KUBERNETES=Yes \
-DFLB_FILTER_ALTER_SIZE=No \
-DFLB_FILTER_AWS=No \
-DFLB_FILTER_EXPECT=No \
-DFLB_FILTER_GREP=No \
-DFLB_FILTER_MODIFY=No \
-DFLB_FILTER_STDOUT=No \
-DFLB_FILTER_PARSER=No \
-DFLB_FILTER_REWRITE_TAG=No \
-DFLB_FILTER_THROTTLE=No \
-DFLB_FILTER_THROTTLE_SIZE=No \
-DFLB_FILTER_NEST=No \
-DFLB_FILTER_LUA=No \
-DFLB_FILTER_RECORD_MODIFIER=No \
-DCMAKE_INSTALL_PREFIX=/ ../
rumanzo commented 3 years ago

With Fluentbit 1.6.0 problem still persists

edsiper commented 3 years ago

would you please check the behavior with Mem_Buf_Limit 256M ?

rumanzo commented 3 years ago

would you please check the behavior with Mem_Buf_Limit 256M ?

It takes time for me to observe, it behaves a little unpredictable

rumanzo commented 3 years ago

Yesterday one pod was restarted due OOM with Mem_Buf_Limit 256M

cobb-tx commented 3 years ago

I have the problem of consent ,Memory increases slowly and OOM

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

hampsterx commented 3 years ago

cough curse you stale bot, this is a valid/serious issue! Worst kind of bug, the random one that appears after a long period of time :(

Any thoughts? Can someone supply graph of increasing memory usage at least, make it real, lol

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.