aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
437 stars 130 forks source link

Consistent SIGSEGV #398

Open chaker-sidhom opened 1 year ago

chaker-sidhom commented 1 year ago
### Describe the question/issue Hi all, We're running the latest `aws-for-fluent-bit` version `2.26.0` on AWS EKS and we're getting consistent segfaults. The issue is very similar to https://github.com/aws/aws-for-fluent-bit/issues/383 ### Configuration - Deployment mode: Daemon set an EKS cluster. The docker image is: `amazon/aws-for-fluent-bit:2.26.0` - Log format: JSON The configMap looks as follows: ``` apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config namespace: amazon-cloudwatch labels: k8s-app: fluent-bit data: fluent-bit.conf: | [SERVICE] Flush 10 Log_Level info Daemon off Parsers_File parsers.conf HTTP_Server ${HTTP_SERVER} HTTP_Listen 0.0.0.0 HTTP_Port ${HTTP_PORT} storage.path /var/fluent-bit/state/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 5M @INCLUDE application-inputs.conf @INCLUDE application-filters.conf @INCLUDE application-outputs.conf @INCLUDE dataplane-log.conf @INCLUDE host-log.conf application-inputs.conf: | [INPUT] Name tail Tag application.central.* Exclude_Path /var/log/containers/*istio-* Path /var/log/containers/*_central_*.log Docker_Mode On Docker_Mode_Flush 5 Docker_Mode_Parser container_firstline Parser docker DB /var/fluent-bit/state/flb_container_central.db Mem_Buf_Limit 25MB Skip_Long_Lines On Refresh_Interval 10 Rotate_Wait 30 storage.type filesystem Read_from_Head ${READ_FROM_HEAD} [INPUT] Name tail Tag application.istio.* Exclude_Path /var/log/containers/*istio-init* Path /var/log/containers/*istio*.log Docker_Mode On Docker_Mode_Flush 5 Docker_Mode_Parser container_firstline Parser docker DB /var/fluent-bit/state/flb_container_istio.db Mem_Buf_Limit 25MB Skip_Long_Lines On Refresh_Interval 10 Rotate_Wait 30 storage.type filesystem Read_from_Head ${READ_FROM_HEAD} application-filters.conf: | [FILTER] Name kubernetes Match application.* Kube_URL https://kubernetes.default.svc:443 Kube_Tag_Prefix application.central.var.log.containers. Merge_Log On Merge_Log_Key log_processed Keep_Log Off K8S-Logging.Parser On K8S-Logging.Exclude Off Labels Off Annotations Off Buffer_Size 0 [FILTER] Name kubernetes Match application.* Kube_URL https://kubernetes.default.svc:443 Kube_Tag_Prefix application.istio.var.log.containers. Merge_Log On Merge_Log_Key log_processed Keep_Log Off K8S-Logging.Parser On K8S-Logging.Exclude Off Labels Off Annotations Off Buffer_Size 0 [FILTER] Name nest Match application.* Operation lift Nested_under kubernetes Add_prefix Kube. [FILTER] Name modify Match application.* Remove Kube.docker_id Remove Kube.container_hash Remove stream [FILTER] Name nest Match application.* Operation nest Wildcard Kube.* Nested_under k Remove_prefix Kube. application-outputs.conf: | [OUTPUT] Name cloudwatch_logs Match application.central.* region ${AWS_REGION} log_group_name /aws/containerinsights/${CLUSTER_NAME}/central log_stream_prefix ${HOST_NAME}- auto_create_group true extra_user_agent container-insights [OUTPUT] Name cloudwatch_logs Match application.istio.* region ${AWS_REGION} log_group_name /aws/containerinsights/${CLUSTER_NAME}/istio log_stream_prefix ${HOST_NAME}- auto_create_group true extra_user_agent container-insights parsers.conf: | [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%LZ [PARSER] Name syslog Format regex Regex ^(?
PettitWesley commented 1 year ago

If you can, try this so we can get a proper stack trace or core dump: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#segfaults-and-crashes-sigsegv

qdupuy commented 1 year ago

Hello,

Same issue from amazon/aws-for-fluent-bit:2.24.0

However, if I go back to version 2.23.0 it works

@PettitWesley

gpetrovgeorgi commented 1 year ago

Hello guys,

The last "stable" version for us was 2.28.4 it looks like the stable image tag had been moved on further version and our AWS ECS containers started breaking with SISEGV error:

AWS for Fluent Bit Container Image Version 2.28.4

on AWS ECS Fargate and we are facing the same problem - our containers are crashing with caught signal (SIGSEGV). Our config uses two plugins inside the OUTPUT directives - S3 and CloudWatch.

I will be happy to see some root cause and fix here @PettitWesley

Some messages from AWS ECS console after I've enabled the debug according to this page:

image
PettitWesley commented 1 year ago

@gpetrovgeorgi there have been many issues fixed since that version: https://github.com/aws/aws-for-fluent-bit/issues/542

Also, we now have pre-built debug images that can output stacktraces and upload cores to S3, if you face this issue in a new version.

https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#firelens-crash-report-runbook