Open ayozemr opened 3 years ago
Hey @ayozemr,
To help me fully understand your problem, can you share the relevant cdk synth
output you'd like to see changed, and can you share the code which creates your container? Thanks!
Sure, there we go. I redacted some possible sensitive values.
The thing is the "log-router" container auto created has an image linked to a param out of my control, that checking ECS taskdefinition json, translates to "image": "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:latest"
So the idea is if there is any possibility that "log-router" image is any other tag instead of "latest" until they fix the bug, so I can make that "log-router" container uses "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.19.0"
instead of latest.
Relevant code:
CDK:
const taskdefFamily = `${buildConfig.App}-${buildConfig.Environment}-dbmigrations`;
const migrationTaskDef = new ecs.TaskDefinition(this, 'MigrationsTaskDef', {
family: taskdefFamily,
compatibility: ecs.Compatibility.FARGATE,
cpu: '256',
memoryMiB: '512',
networkMode: ecs.NetworkMode.AWS_VPC,
taskRole: taskdefRole,
executionRole: taskExecutionRole,
});
migrationTaskDef.addContainer('FlywayContainer', {
image: ecs.RepositoryImage.fromEcrRepository(ecrRepository, 'latest'),
memoryLimitMiB: 512,
environment: {
ENV: buildConfig.Environment,
FLYWAY_URL: `jdbc:postgresql://${dbcredentials
.secretValueFromJson('host')
.toString()}:${dbcredentials
.secretValueFromJson('port')
.toString()}/db`,
},
logging: ecs.LogDrivers.firelens({
options: {
Name: 'datadog',
provider: 'ecs',
TLS: 'on',
dd_message_key: 'log',
dd_service: `${buildConfig.App}-Migrations`,
dd_source: 'java',
dd_tags: `env:${buildConfig.Environment}`,
},
secretOptions: {
apikey: ecs.Secret.fromSsmParameter(
ssm.StringParameter.fromStringParameterName(
this,
'DatadogApiKey',
'/datadog/APIKEY'
)
),
},
}),
});
// ----------------------------- Synth cloudformation:
MigrationsTaskDef1F4C395A:
Type: AWS::ECS::TaskDefinition
Properties:
ContainerDefinitions:
- Environment:
- Name: ENV
Value: stage
- Name: FLYWAY_URL
Value: REDACTED
Essential: true
Image:
Fn::Join:
- ""
- - Fn::Select:
- 4
- Fn::Split:
- ":"
- Fn::GetAtt:
- MigrationsServiceDECCB9A7
- Arn
- .dkr.ecr.
- Fn::Select:
- 3
- Fn::Split:
- ":"
- Fn::GetAtt:
- MigrationsServiceDECCB9A7
- Arn
- "."
- Ref: AWS::URLSuffix
- /
- Ref: MigrationsServiceDECCB9A7
- :latest
LogConfiguration:
LogDriver: awsfirelens
Options:
Name: datadog
provider: ecs
TLS: "on"
dd_message_key: log
dd_service: svc-Migrations
dd_source: java
dd_tags: env:stage
SecretOptions:
- Name: apikey
ValueFrom: REDACTED-PARAM
Memory: 512
Name: app
- Essential: true
FirelensConfiguration:
Type: fluentbit
Image:
Ref: SsmParameterValueawsserviceawsforfluentbitlatestC96584B6F00A464EAD1953AFF4B05118Parameter
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group:
Ref: MigrationsTaskDeflogrouterLogGroupAFD29B37
awslogs-stream-prefix: firelens
awslogs-region: us-east-1
MemoryReservation: 50
Name: log-router
Cpu: "256"
ExecutionRoleArn:
Fn::GetAtt:
- TaskExecRoleA2BBB60C
- Arn
Family: REDACTED
Memory: "512"
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
Tags:
- Key: app
Value: appname
- Key: environment
Value: stage
- Key: stack
Value: stackname
TaskRoleArn:
Fn::GetAtt:
- TaskRole30FC0FBB
- Arn
Metadata:
aws:cdk:path: stackname/MigrationsTaskDef/Resource
This is where the image is set: https://github.com/aws/aws-cdk/blob/026cb8f1aa6b915969e22050968b0719c83aa2a4/packages/%40aws-cdk/aws-ecs/lib/base/task-definition.ts#L679-L696
The obtainDefaultFluentBitECRImage
function used for the image here will return this image: https://github.com/aws/aws-cdk/blob/026cb8f1aa6b915969e22050968b0719c83aa2a4/packages/%40aws-cdk/aws-ecs/lib/firelens-log-router.ts#L179-L184
fluentBitImageTag
will always be latest
in this case, because imageTag
isn't passed in as an argument to obtainDefaultFluentBitECRImage(...)
. So there's no direct way to modify this by the user before the CDK creates the CloudFormation output with the latest image.
Unfortunately, the use of escape hatches won't work here because CfnTaskDefinition.containerDefinitions
is a lazy value. I'll have to research to see if there's a workaround for this.
Hey @ayozemr ,
I think you can use the public addFirelensLogRouter()
method of TaskDefinition
to supply the FireLens router, using any image you want. Something like:
migrationTaskDef.addFirelensLogRouter('log-router', {
image: your-image-here,
firelensConfig: {
type: FirelensLogRouterType.FLUENTBIT,
},
logging: new AwsLogDriver({ streamPrefix: 'firelens' }),
memoryReservationMiB: 50,
});
Thanks, Adam
BTW, I think this should morph into a feature request to be able to more easily customize the image for the firelens()
method.
Thanks @peterwoodworth for your time.
@skinny85 I have tried that but ended having problem when pulling the non latest image 🤷 . Had to use fromRepositoryArn because fromRepositoryName was adding my account id by default, ending in an incorrect image uri.
Its weird but:
Reading official repo seems there is a public image, but looking at its URI I dont know how I can build its ARN to be able to use:
https://github.com/aws/aws-for-fluent-bit -> public.ecr.aws/aws-observability/aws-for-fluent-bit:
So your solution works to build using another image, but I had no luck with the pull process in fargate...
Code used:
const awsFluentRepo = ecr.Repository.fromRepositoryArn(
this,
'AwsFluentRepo',
'arn:aws:ecr:us-east-1:906394416424:/aws-for-fluent-bit'
// '906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:2.19.0'
);
migrationTaskDef.addFirelensLogRouter('log-router', {
image: ecs.ContainerImage.fromEcrRepository(awsFluentRepo, '2.19.0'),
firelensConfig: {
type: ecs.FirelensLogRouterType.FLUENTBIT,
},
logging: new ecs.AwsLogDriver({ streamPrefix: 'firelens' }),
memoryReservationMiB: 50,
});
Something to override firelens image would be useful, especially in cases like this related to a bug. Maybe also something to use a container image using its URI instead of arn, would let me use that public ecr image
Thanks for your time all!!
@ayozemr I think what you're looking for here is the ContainerImage.fromRegistry()
method.
So something like:
migrationTaskDef.addFirelensLogRouter('log-router', {
image: ecs.ContainerImage.fromRegistry('public.ecr.aws/aws-observability/aws-for-fluent-bit:2.19.0'),
// ...
Oh yes! That did it 👏 Much appreciated!! Thanks for your time!
About changing the issue to feature request, I don't know how to do it. I can close this one and create another referencing this if its ok for you
I just ran into this problem and opened https://github.com/aws/aws-cdk/pull/16439 to default to the stable
image instead of latest
.
Great, I've converted this into a feature request. Changing the image to default to stable seems like a fine solution to me
Closing this issue, since the upstream issue with fluentbit has been resolved. And we are no longer planning to implement this change in the default.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
I know this is closed as wontfix, but I figured I'd post anyway since the AWS for Fluent Bit image broke again: https://github.com/aws/aws-for-fluent-bit/issues/491
This caused an outage for us that would have been avoided by using stable
.
Is using bleeding-edge Fluent Bit, at increased risk of instability, really worth it? stable
only tracks behind by 14 days, is this really a breaking change that needs to be avoided? It would probably help a lot of people like me who had to learn way more than they ever wanted about the internals of Fluent Bit images in order to address this issue today.
Per the comment visibility warning, cc @madeline-k and @peterwoodworth RE: @paulrosania's comment above.
My PR was closed because changing from latest
to stable
would technically be a breaking change. However, since it broke again, I think it's worth making the breaking change, even behind a feature flag to make people's lives easier. Having your ECS tasks fail to start up twice in two years because of problems with latest
seems unacceptable IMO.
We can possibly implement this behind a feature flag to avoid the breakage. Should be a fairly simple contribution if someone wants to take a crack at it.
:question: General Issue
The Question
I am affected by a bug in Fluentbit that causes DNS problems when using firelens log driver, almost with datadog. (https://github.com/aws/aws-for-fluent-bit/issues/233). As of now, working solution is to not use latest image and use 2.19.0, which I have tested working.
My problem is that using CDK I don't see how I can set to use a fixed fluentbit version in the "log-router" container, as that container definition is created by cloudformation when we specify the logrouter in CDK/CFN in the app container taskdef, the system adds that log-router sidecar container with the version set to latest automaticly.
Its possible to change that sidecar container image from latest to 2.19.0?
Environment
Other information
Thanks for your time!!