Describe the bug
When trying to run the image amazon/aws-for-fluent-bit:latest or any stable previous version, when the task boots it cannot reach the elastic cluster.
When deploying the task, make sure that it is accessible with a public IP and that it leads to the NGINX container.
The logs of the log router will show:
24 November 2024 at 00:06 (UTC) [2024/11/24 00:06:49] [ warn] [net] getaddrinfo(host='**redacted**.eu-west-1.aws.found.io:443', err=4): Domain name not found [2024/11/24 00:06:49] [ warn] [engine] failed to flush chunk '1-1732406808.778340835.flb', retry in 7 seconds: task_id=0, input=forward.1 > output=es.1 (out_id=1)
Note that redacted.eu-west-1.aws.found.io:443 is accessible from the browser at the time I get this error.
If Cloud_ID is edited to remove the port, the logs look different like an invalid argument is provided.
Expected behavior
The logs should go to Elastic.
Your Environment
Version used: Latest & 2.32.4
Configuration: ECS Fargate on Linux ARM 64
Filters and plugins: None
Additional context
The goal is to have this tool to send all logs to Elastic from the 200 tasks running as a sidecar for each. It is not manageable to have an agent instead.
Bug Report
Describe the bug When trying to run the image amazon/aws-for-fluent-bit:latest or any stable previous version, when the task boots it cannot reach the elastic cluster.
To Reproduce
Follow the tutorial from elastic: https://www.elastic.co/blog/elastic-cloud-with-aws-firelens-accelerate-time-to-insight-with-agentless-data-ingestion
For ECS, the task will look like this:
{ "family": "firelens-fargate-elastic", "taskRoleArn": "**redacted**", "executionRoleArn": "**redacted**", "networkMode": "awsvpc", "cpu": "512", "memory": "1024", "requiresCompatibilities": [ "FARGATE" ], "containerDefinitions": [ { "essential": true, "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:2.32.4", "name": "log_router", "firelensConfiguration": { "type": "fluentbit", "options": { "enable-ecs-log-metadata": "true" } }, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "firelens-container", "awslogs-region": "eu-west-1", "awslogs-create-group": "true", "awslogs-stream-prefix": "firelens" } }, "memoryReservation": 50 }, { "essential": true, "image": "nginx", "name": "app", "logConfiguration": { "logDriver":"awsfirelens", "secretOptions": [ { "valueFrom": "**redacted**:CLOUD_ID::", "name": "Cloud_ID" }, { "valueFrom": "**redacted**:CLOUD_AUTH::", "name": "Cloud_Auth" } ], "options": { "Name": "es", "Port": "9243", "Tag_Key tags": "tags", "Include_Tag_Key": "true", "Index": "elastic_firelens", "tls": "On", "tls.verify": "Off" }}, "memoryReservation": 100 } ] }
When deploying the task, make sure that it is accessible with a public IP and that it leads to the NGINX container. The logs of the log router will show:
24 November 2024 at 00:06 (UTC) [2024/11/24 00:06:49] [ warn] [net] getaddrinfo(host='**redacted**.eu-west-1.aws.found.io:443', err=4): Domain name not found [2024/11/24 00:06:49] [ warn] [engine] failed to flush chunk '1-1732406808.778340835.flb', retry in 7 seconds: task_id=0, input=forward.1 > output=es.1 (out_id=1)
Note that redacted.eu-west-1.aws.found.io:443 is accessible from the browser at the time I get this error. If Cloud_ID is edited to remove the port, the logs look different like an invalid argument is provided.
Expected behavior
The logs should go to Elastic.
Your Environment
Additional context
The goal is to have this tool to send all logs to Elastic from the 200 tasks running as a sidecar for each. It is not manageable to have an agent instead.