aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.08k stars 616 forks source link

VMNetwork adapter 'vEthernet (nat)*' not found #4445

Open RomaricKanyamibwa opened 20 hours ago

RomaricKanyamibwa commented 20 hours ago

Summary

Much like the issue 2416, there seems to be an issue with the Windows_Server-2022-English-Full-ECS_Optimized AMIs, where the ECS-Agent is sometimes having issues connecting to the ECS Cluster due to some virtual hardware issues (the VMNetwork cannot be found). Like the other issue, this, too, seems random but will happen sporadically on our windows image.

Description

Using packer we create our own AMIs based on the Windows_Server-2022-English-Full-ECS_Optimized AMIs. On the AMI we install ssh, then pull our windows docker images, and finally terminate it by installing EC2Launchv2. Once the AMI is ready we use it on our ECS cluster with the user data :

# configure ecs cluster
[Environment]::SetEnvironmentVariable("ECS_CLUSTER", "cluster-x86_64-windows","Machine")
[Environment]::SetEnvironmentVariable("ECS_IMAGE_PULL_BEHAVIOR","prefer-cached","Machine")
[Environment]::SetEnvironmentVariable("ECS_AWSVPC_BLOCK_IMDS","true ","Machine")
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE","true","Machine")
# init ecs agent
Import-Module ECSTools
Initialize-ECSAgent -EnableTaskIAMRole -EnableTaskENI -LoggingDrivers "['json-file','awslogs']"

Periodically one of the instances in the ASG fails to get attached to the ECS Cluster with the following errors:

2024-11-25T10:07:52Z - [INFO]:ScheduledTask Initialize-ECSHostReboot created.
2024-11-25T10:07:52Z - [INFO]:Configuring ECS Host for Task IAM Roles...
2024-11-25T10:07:52Z - [INFO]:Server Edition: Microsoft Windows Server 2022 Datacenter
2024-11-25T10:07:55Z - [INFO]:Attempt#: 10, Adapters:

2024-11-25T10:07:55Z - [INFO]:VMNetwork adapter 'vEthernet (nat)*' not found
2024-11-25T10:07:55Z - [INFO]:Retrying after sleeping 1sec

This error makes the instance unusable to the cluster, so the ASG launches a new one while the old one is left dangling unused.

Expected Behavior

The ECS-Agent reliably connects to the ECS cluster without errors.

Observed Behavior

The ECS-Agent will sometimes fail, and the instance will not be attached to the ECS cluster and will just continue running.

Supporting Log Snippets

UserScript.ps1.log output.log err.log