aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
461 stars 134 forks source link

Missing logs when main containers exits immediately #856

Open proof-nicholas opened 1 month ago

proof-nicholas commented 1 month ago

We are aws-fluent-bit to route the logs from our main container to Datadog. I was recently troubleshooting an issue where the ECS Fargate task was exiting due to one of its essential containers exiting but couldn't find any logs in Datadog indicating a failure. I then disabled fluent-bit logging so that the task logs go to CloudWatch. Then in the ECS Fargate console, when the the task exited, I was able to see the application log messages indicating the errors (required environment variables missing). I suspect the main container exited so fast that either fluent-bit did not receive the logs or did not ship them to Datadog before the task was terminated. How can I prevent this from happening?

Configuration

I am using AWS ECS Copilot Logging:

logging:
  image: 351603118025.dkr.ecr.us-east-2.amazonaws.com/aws-fluent-bit
  destination:
    Name: "datadog"
    Host: "http-intake.logs.datadoghq.com"
    compress: "gzip"
    dd_service: "my-service"
    dd_source: "my-service"
    dd_tags: "env:staging"
    TLS: "on"
    provider: "ecs"
  secretOptions:
    apiKey: XXXXXXX
  configFilePath: "/fluent-bit/configs/parse-json.conf"

Fluent Bit Version Info

7.57.2

Cluster Details

ECS Fargate with Fluent Bit deployed as a Sidecar with awsvpc networking.

swapneils commented 1 month ago

Could you try the changes in Wesley's PR changing the Grace period behavior during shutdown (https://github.com/aws/aws-for-fluent-bit/pull/829)? That might be what kept the logs from getting ingested.