bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.7k stars 512 forks source link

awsvpcTrunking with arm64 instances seems to be unavailable #3153

Closed f0o closed 1 year ago

f0o commented 1 year ago

Image I'm using: /aws/service/bottlerocket/aws-ecs-1/arm64/latest/image_id

What I expected to happen: InstaceType m6g.medium to spin up 4 Tasks with awsvpcTrunking as per https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types

aws ecs list-attributes       --target-type container-instance       --attribute-name ecs.awsvpc-trunk-id       --cluster test-cluster
{
    "attributes": []
}

However the awsvpctrunking setting has been enabled in the ECS Account Settings

All EC2 Instances have been re-created as per documentation (multiple times even - also changing types)

What actually happened: encountered error "RESOURCE:ENI" - can only create 1 Task per host (which matches to the table in above link without trunking enabled)

How to reproduce the problem:

  1. create ecs cluster
  2. enable awsvpctrunking
  3. spawn m6g.medium ecs hosts (with resource-name-dns-a-record disabled as per docs)
  4. create more than 1 task per host and watch it fail
f0o commented 1 year ago

Can it be that I have to create the ECS Cluster after enabling awsvpcTrunking?

Is there a way to migrate existing ECS Clusters onto awsvpcTrunking?

yeazelm commented 1 year ago

Hello @f0o! Thanks for cutting this issue. It looks like from the docs you need to launch new instances after enabling the setting: Only new Amazon EC2 instances launched after opting in to awsvpcTrunking receive the increased ENI limits and the trunk network interface. Previously launched instances do not receive these features regardless of the actions taken. but there is no mention of needing to recreate the clusters.

We'll look into to this issue to see if there is something more going on, but it might be worth trying to create the cluster just in case there is some lingering setting tied to the cluster.

f0o commented 1 year ago

I can verify that creating a New Cluster (post enabling awsvpcTrunking in defaults) does make it work.

Docs doesnt say anything about this and I've rotated all EC2 instances from my old Cluster a few times (even switched between instance types, including c5.large which is mentioned https://github.com/bottlerocket-os/bottlerocket/pull/1246 to support awsvpctrunking in bottlerocket)

I'm unsure who the culprit here is, might as well be bad AWS Docs. However that leaves me to question how a migration to an awsvpctrunking would look like since creating a whole new cluster and shifting traffic-flow can be very tedious (in my case outright impossible without major reworking of trafficflows)

yeazelm commented 1 year ago

Thanks for the update @f0o, it sounds like Bottlerocket is working with the new cluster, but there is something at the cluster level that is keeping it from taking effect for you on the existing one. It might be worth reaching out to ECS support if you can to confirm from the cluster side if there is something you can do about migrating the existing cluster to this setting.

yeazelm commented 1 year ago

@f0o did you end up figuring out a way to change this for an existing cluster? Is there anything else we should cover in this issue is it fine to close?

f0o commented 1 year ago

Hi @yeazelm

After talking to AWS the solution is to Wait

Setting the awsvpcTrunking setting apparently is not immediate for existing clusters and they take an undefined amount of time to propagate even if the EC2 instances are cycled.

We followed up with it on the next day and magically the old cluster had awsvpcTrunking enabled as the instances cycled multiple times throughout the previous day from ASG actions.

I suggested AWS to update their documentation since they made it sound like only EC2 cycling was needed.

Considering this issues Closed. Thanks for your help!