Open amchech opened 4 days ago
To help others:
[SERVICE]
Daemon Off
Flush 1
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Path /var/log/containers/*.log
# Exclude fluent-bit logs, certain error conditions can cause loops
# that can effectively DoS outputs with very high logging rates
# (see https://github.com/fluent/fluent-bit/issues/3829)
Exclude_Path /var/log/containers/fluent-bit-*_kube-system_*.log
multiline.parser docker, cri
Tag kube.<namespace_name>.<pod_name>.<container_name>-<container_id>
Mem_Buf_Limit 5MB
Skip_Long_Lines On
DB /var/log/flb_pods_tail.db
Tag_Regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$
[INPUT]
Name tail
Path /usr/share/reactshost/*/ReactsLogs/Metrics/*/*.json
Tag reacts-metrics
Parser reacts-metrics-parser
Path_Key filename
DB /usr/share/reactshost/fluentbit/logs.db
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
Kube_Tag_Prefix kube.
Regex_Parser kubePodCustom
[FILTER]
Name rewrite_tag
Match kube.*
Rule $kubernetes['pod_id'] ^.*4.*$ cw.$TAG true
Emitter_Name cw_re_emitted
[FILTER]
Name grep
Match cw.*
Exclude $kubernetes['labels']['logging.cloudwatch.aws/enabled'] false
[FILTER]
Name grep
Match kube.*
Exclude $kubernetes['namespace_name'] loki-system
[FILTER]
Name modify
Match kube.*
Rename level level_label
Rename instance instance_label
[FILTER]
Name parser
Match reacts-metrics
Key_Name filename
Parser filename-parser
Reserve_Data On
[OUTPUT]
Name loki
Match kube.*
Host loki-gateway.loki-system
Port 80
labels job=fluentbit, type=logs, namespace=$kubernetes['namespace_name'], component=$kubernetes['container_name'], level=$level_label, instance=$instance_label
[OUTPUT]
Name loki
Match reacts-metrics
Host loki-gateway.loki-system
Port 80
Labels job=fluentbit, component=$component, instance=$instance, type=metrics
Bug Report
Description We are experiencing occasional restarts of Fluent Bit pods running as a DaemonSet in our EKS cluster. The pods are restarting with an exit code of 139 (segmentation fault). According to our Prometheus metrics, the issue is not caused by a running out of memory nor CPU usage.
Logs
Environment Fluent Bit Version: version=3.0.6, commit=9af65e2c36 Note we already update to version=3.1.9, commit=431fa79ae2 and we have same issue. Kubernetes Version: v1.29.0 EKS Version: v1.29.0-eks-680e576 Node Operating System: Bottlerocket OS 1.21.1 (aws-k8s-1.29) kernel 6.1.102 Container Runtime: containerd://1.7.20+bottlerocket Node Configuration: CPU: 4 vCPU Memory: 8GB Instance Type: c6a.xlarge
Deployment in EKS Fluent Bit is deployed as a Daemon Set in an EKS cluster. Resource limits and requests are set for memory and CPU.
Additional context Attached you find log files and fluentbit configs. fleuntbitlog.txt custom_parser.txt fluent-bit.txt