aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
437 stars 130 forks source link

caught signal (SIGSEGV) #837

Open June3Ningxu opened 2 weeks ago

June3Ningxu commented 2 weeks ago

Describe the question/issue

Fluentbit version: amazon/aws-for-fluent-bit:debug-2.32.2.20240516 Deployment mode: AWS ECS Fargate sidecar Programming language: N/A Log format: JSON

After running for sometime, the fluentbit will eixt with error "[engine] caught signal (SIGSEGV)" so we use the debug version to catch the dump log

Configuration

"logConfiguration": {
                "logDriver": "awsfirelens",
                "options": {
                    "remove_keys": "ecs_task_arn",
                    "label_keys": "$container_name,$ecs_task_definition,$source,$ecs_cluster,$container_id",
                    "port": "******",
                    "Host": "***********",
                    "line_format": "key_value",
                    "Name": "loki",
                    "labels": "job=firelens"
                }
            },
{
            "name": "log_router",
            "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:debug-2.32.2.20240516",
            "cpu": 0,
            "memoryReservation": 500,
            "portMappings": [],
            "essential": false,
            "environment": [
                {
                    "name": "S3_BUCKET",
                    "value": "****************"
                },
                {
                    "name": "FLB_LOG_LEVEL",
                    "value": "debug"
                }
            ],
            "mountPoints": [],
            "volumesFrom": [],
            "user": "0",
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/ecs-aws-firelens-sidecar-container",
                    "awslogs-create-group": "true",
                    "awslogs-region": "cn-north-1",
                    "awslogs-stream-prefix": "firelens"
                }
            },
            "systemControls": [],
            "firelensConfiguration": {
                "type": "fluentbit",
                "options": {
                    "enable-ecs-log-metadata": "true"
                }
            }
        }

Fluent bit dump log

issue_2024-06-21T000043_host-ip-192-168-65-99.cn-north-1.compute.internal_7619239412270.all.zip issue_2024-06-20T190044_host-ip-192-168-194-12.cn-north-1.compute.internal_101311429628425.all.zip issue_2024-06-20T190044_host-ip-192-168-194-12.cn-north-1.compute.internal_101311429628425.core.zip

Fluent Bit Log Output

debug log


2024-06-21T08:00:41.393+08:00 | [2024/06/21 00:00:41] [debug] [input chunk] update output instances with new chunk size diff=598, records=1, input=forward.1
  | 2024-06-21T08:00:42.374+08:00 | [2024/06/21 00:00:42] [debug] [task] created task=0x7f139e244b60 id=0 OK
  | 2024-06-21T08:00:42.374+08:00 | [2024/06/21 00:00:42] [debug] [upstream] KA connection #53 to 192.168.133.167:3100 has been assigned (recycled)
  | 2024-06-21T08:00:42.374+08:00 | [2024/06/21 00:00:42] [debug] [http_client] not using http_proxy for header
  | 2024-06-21T08:00:42.378+08:00 | [2024/06/21 00:00:42] [debug] [output:loki:loki.1] 192.168.133.167:3100, HTTP status=204
  | 2024-06-21T08:00:42.378+08:00 | [2024/06/21 00:00:42] [debug] [upstream] KA connection #53 to 192.168.133.167:3100 is now available
  | 2024-06-21T08:00:42.378+08:00 | [2024/06/21 00:00:42] [debug] [out flush] cb_destroy coro_id=9393
  | 2024-06-21T08:00:42.378+08:00 | [2024/06/21 00:00:42] [debug] [task] destroy task=0x7f139e244b60 (task_id=0)
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=558, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=440, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=426, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=405, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=333, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=596, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=414, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=494, records=1, input=forward.1
  | 2024-06-21T08:00:42.386+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=1837, records=1, input=forward.1
  | 2024-06-21T08:00:42.825+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=483, records=1, input=forward.1
  | 2024-06-21T08:00:42.825+08:00 | [2024/06/21 00:00:42] [debug] [input chunk] update output instances with new chunk size diff=598, records=1, input=forward.1
  | 2024-06-21T08:00:43.295+08:00 | [2024/06/21 00:00:43] [debug] [input chunk] update output instances with new chunk size diff=959, records=1, input=forward.1
  | 2024-06-21T08:00:43.374+08:00 | [2024/06/21 00:00:43] [debug] [task] created task=0x7f139e244b60 id=0 OK
  | 2024-06-21T08:00:43.374+08:00 | [2024/06/21 00:00:43] [engine] caught signal (SIGSEGV)

Fluent Bit Version Info

debug-2.32.2.20240516

Cluster Details

AWS ECS Fargate, fluent bit as a sidecar

Application Details

300 logs lines Every minutes log line size: 1k

Steps to reproduce issue

the ECS Fargate running for sometime, then it crash