Closed mfortin closed 4 years ago
My current workaround:
Get-NetAdapter -Name "vEthernet (nat)" | Where-Object Status -eq Disconnected | Rename-NetAdapter -NewName "vEthernet (nat) - Disconnected"
Get-NetAdapter -Name "vEthernet (nat)*" | Where-Object Status -eq Up | Rename-NetAdapter -NewName "vEthernet (nat)"
My current workaround:
Get-NetAdapter -Name "vEthernet (nat)" | Where-Object Status -eq Disconnected | Rename-NetAdapter -NewName "vEthernet (nat) - Disconnected" Get-NetAdapter -Name "vEthernet (nat)*" | Where-Object Status -eq Up | Rename-NetAdapter -NewName "vEthernet (nat)"
Thanks for reporting the issue. We'll fix this.
2019, sorry for not being specific.
A fix for this issue will be included in the January ECS-Optimized Windows AMI release.
New ECS-Optimized Windows AMI has been released and contains a fix for this. Please upgrade to the latest version, and let us know if you still have issue. Thanks!
@fenxiong the version bundled on the newer Windows AMI is worst.
> Initialize-ECSHost
2020-01-27T17:05:53Z - [INFO]:Configuring ECS Host...
You cannot call a method on a null-valued expression.
At C:\Program Files\WindowsPowerShell\Modules\ECSTools\ECSTools.psm1:1247 char:13
+ $defaultNatAdapterList = $defaultNatAdapterList | Where-O ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
2020-01-27T17:05:53Z - [ERROR]:Multiple adapters found matching the default name 'vEthernet (nat)*', and more than one has a status of 'Up'. This is an unsupported configuration. Unable to select adapter.
Multiple adapters found matching the default name 'vEthernet (nat)*', and more than one has a status of 'Up'. This is an unsupported configuration. Unable to select adapter.
At C:\Program Files\WindowsPowerShell\Modules\ECSTools\ECSTools.psm1:68 char:5
+ Throw $Message
+ ~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (Multiple adapte...select adapter.:String) [], RuntimeException
+ FullyQualifiedErrorId : Multiple adapters found matching the default name 'vEthernet (nat)*', and more than one has a status of 'Up'. This is an unsupported configuration. Unable to select adapter.
> Get-NetAdapter
Name InterfaceDescription ifIndex Status MacAddress LinkSpeed
---- -------------------- ------- ------ ---------- ---------
vEthernet (nat) Hyper-V Virtual Ethernet Adapter 15 Disconnected 00-15-5D-09-F3-1F 10 Gbps
vEthernet (nat) 5 Hyper-V Virtual Ethernet Adapter #5 14 Disconnected 00-15-5D-7D-78-A8 10 Gbps
vEthernet (nat) 7 Hyper-V Virtual Ethernet Adapter #7 20 Up 00-15-5D-0A-C4-41 10 Gbps
vEthernet (nat) 4 Hyper-V Virtual Ethernet Adapter #4 12 Disconnected 00-15-5D-BF-95-C8 10 Gbps
vEthernet (nat) 3 Hyper-V Virtual Ethernet Adapter #3 10 Disconnected 00-15-5D-01-EF-C6 10 Gbps
vEthernet (nat) 2 Hyper-V Virtual Ethernet Adapter #2 4 Disconnected 00-15-5D-1A-91-A2 10 Gbps
vEthernet (nat) 6 Hyper-V Virtual Ethernet Adapter #6 3 Disconnected 00-15-5D-D9-53-9B 10 Gbps
Here is a diff to make it work:
$ diff ECSTools-from-ami.psm1 ECSTools.psm1
1245c1245
< if ($defaultNatAdapterList.Count -gt 1) {
---
> if ( @($defaultNatAdapterList).Count -gt 1) {
1247,1248c1247,1248
< $defaultNatAdapterList = $defaultNatAdapterList | Where-Object ($_.Status.ToLower() -eq 'up')
< if ($defaultNatAdapterList.Count -eq 1) { # use that adapter
---
> $defaultNatAdapterList = $defaultNatAdapterList | Where-Object Status -eq 'up'
> if ( @($defaultNatAdapterList).Count -eq 1) { # use that adapter
1250c1250
< } elseif ($defaultNatAdapterList.Count -eq 0) { # We can't know which adapter to wait for, fail fast since there should only be one
---
> } elseif ( @($defaultNatAdapterList).Count -eq 0) { # We can't know which adapter to wait for, fail fast since there should only be one
@mfortin apologies for the issue. I will reopen this and we will look into it.
@mfortin Sorry to hear that the new version introduced issues to you . We are working on fixing this issue.
@sandeepindraganti any updates on this issue?
Sorry for the inconvenience caused . We are planning to include the fix as part of our next ECS AMI release which is planned during March.
Hello @sandeepindraganti,
We are currently using the _Windows_Server-2019-English-Full-ECSOptimized-2020.03.18 AMI and are facing some sporadic issues with it. Is it the AMI release you mentionned in your comment?
I was redirected here when trying to find information about my issue, but I can't quite tell if I should open a new issue.
We are using this AMI along with the Initialize-ECSAgent
user data script, but it often fails to register as an ECS-Agent. I am including the user data execution log I was able to get from one of my snapshots.
Let me know if this is unrelated, and I will open an new issue.
Thank you!
Hi Mabiro, the 03.18 AMIs do have the fix for this thread included. However, I haven't been able to reproduce your issue, and it seems to be different from the issue described in this thread. Some more information would be useful.
Is the EnableTaskIAMRoles flag enabled? How often are you seeing this issue? If you connect to a non-connected instance and run "Get-NetAdapter" does it include an adapter with the name "vEthernet (nat)"? Have you made any modifications to the AMI?
Thanks!
Hi Louie,
Thank you for the quick feedback.
Here's the user data script that we're using inside our launch configuration. We use an auto scaling group that we set to 1 or 0 depending on when we need to have an actual windows ecs-agent
<powershell>
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE", $TRUE, "Machine")
Initialize-ECSAgent -Cluster application-cluster -EnableTaskIAMRole -LoggingDrivers '[\"json-file\",\"awslogs\"]'
</powershell>
We have multiple AWS accounts (one per client) and this issue happens sporadically on different accounts. Out of around 10 accounts, it happens at least twice a week. There doesn't seem to be an actual pattern. I tried reproducing it by terminating and restarting an instance multiple times, but to no avail. It just happens randomly on random accounts. So far, the fix has been to simply terminate the hanging machine and wait for the new one to boot and it usually connects right away, without a need to retry once more.
As for the AMI, we are using a terraform script which updates the launch configuration with the latest AMI (as found under /aws/service/ami-windows-latest/Windows_Server-2019-English-Full-ECS_Optimized in the parameter store). We aren't modifying it.
I currently do not have access to a machine that wasn't able to connect, but I will try to run Get-NetAdapter once this error happens again. I do have snapshots of volumes, maybe that is enough to do the test?
Thanks!
EDIT: For what it's worth, we had the issue before the 03.18 AMIS. So it doesn't seem like an issue that arose from this new version.
That user data script looks fine, and thank you for noting that it happens on other AMIs as well, that's really useful. That said this does seem unrelated, so please open up a separate issue to track this.
The separate issue is being tracked here: https://github.com/aws/amazon-ecs-agent/issues/2416
Summary
We use the Windows ECS Optimized AMI as a starting AMI, on which we run our automation to install different security scanning tools and other scripts. Once completed, we run sysprep and create a new AMI. That AMI is then used to start the EC2 hosts that are part of our ECS cluster.
The problem is that, when invoking
Initialize-ECSAgent -EnableTaskIAMRole
, it fails as there are multiple adapters found using theGet-NetAdapter
Description
Here are the logs from the user-data where
Initialize-ECSAgent
is executed:In
C:\Program Files\WindowsPowerShell\Modules\ECSTools\ECSTools.psm1
, the functionInitialize-ECSHost
has$( $vmNetAdapter = Get-NetAdapter 'vEthernet (nat)' -ErrorAction:Ignore ) 2>$null
where it should be$( $vmNetAdapter = Get-NetAdapter 'vEthernet (nat)*' -ErrorAction:Ignore | Where-Object Status -eq 'Up') 2>$null
Expected Behavior
Expected behavior is that the adapter being available is used to setup the port proxy.
Observed Behavior
A disconnected adapter is used and port proxy failed to be configured.
Environment Details
Supporting Log Snippets