aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.04k stars 323 forks source link

ssm agent fails to start on windows t3 ec2 instances #348

Closed Zazcallabah closed 3 years ago

Zazcallabah commented 3 years ago

Hi, I've started seeing errors very similar to https://github.com/aws/amazon-ssm-agent/issues/48 when codedeploy creates new instances for deployment.

Specifically when I create new t3 instances. t2 instances seem to work fine.

The ami was created from a t2 snapshot taken in eu-west-1c in december, and it has worked fine creating t2 instances in eu-west-1a - is creating t3 instances from t2 snapshots not supported?

amazon-ssm-agent.log says is

2021-01-29 09:29:40 ERROR error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:29:40 ERROR Failed to start agent. error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:30:10 INFO Getting IE proxy configuration for current user: The operation completed successfully.
2021-01-29 09:30:10 INFO Getting WinHTTP proxy default configuration: The operation completed successfully.
2021-01-29 09:30:10 INFO Proxy environment variables:
2021-01-29 09:30:10 INFO http_proxy: 
2021-01-29 09:30:10 INFO https_proxy: 
2021-01-29 09:30:10 INFO no_proxy: 
2021-01-29 09:33:36 ERROR error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:33:36 ERROR Failed to start agent. error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:34:06 INFO Getting IE proxy configuration for current user: The operation completed successfully.
2021-01-29 09:34:06 INFO Getting WinHTTP proxy default configuration: The operation completed successfully.
2021-01-29 09:34:06 INFO Proxy environment variables:
2021-01-29 09:34:06 INFO http_proxy: 
2021-01-29 09:34:06 INFO https_proxy: 
2021-01-29 09:34:06 INFO no_proxy: 
2021-01-29 09:37:32 ERROR error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:37:32 ERROR Failed to start agent. error fetching the instanceID, Failed to fetch instance ID. Data from vault is empty. RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-29 09:38:02 INFO Getting IE proxy configuration for current user: The operation completed successfully.
2021-01-29 09:38:02 INFO Getting WinHTTP proxy default configuration: The operation completed successfully.
2021-01-29 09:38:02 INFO Proxy environment variables:
2021-01-29 09:38:02 INFO http_proxy: 
2021-01-29 09:38:02 INFO https_proxy: 
2021-01-29 09:38:02 INFO no_proxy: 

... and so on

The route table is

PS C:\Users\Administrator> route print
===========================================================================
Interface List
  7...02 39 6b da a1 1f ......Amazon Elastic Network Adapter
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0       172.31.0.1      172.31.9.40     15
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  169.254.169.123  255.255.255.255      172.31.32.1      172.31.9.40     30
  169.254.169.249  255.255.255.255      172.31.32.1      172.31.9.40     30
  169.254.169.250  255.255.255.255      172.31.32.1      172.31.9.40     30
  169.254.169.251  255.255.255.255      172.31.32.1      172.31.9.40     30
  169.254.169.253  255.255.255.255      172.31.32.1      172.31.9.40     30
  169.254.169.254  255.255.255.255      172.31.32.1      172.31.9.40     30
       172.31.0.0    255.255.240.0         On-link       172.31.9.40    271
      172.31.9.40  255.255.255.255         On-link       172.31.9.40    271
    172.31.15.255  255.255.255.255         On-link       172.31.9.40    271
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link       172.31.9.40    271
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link       172.31.9.40    271
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
  169.254.169.254  255.255.255.255      172.31.32.1      15
  169.254.169.250  255.255.255.255      172.31.32.1      15
  169.254.169.251  255.255.255.255      172.31.32.1      15
  169.254.169.249  255.255.255.255      172.31.32.1      15
  169.254.169.123  255.255.255.255      172.31.32.1      15
  169.254.169.253  255.255.255.255      172.31.32.1      15
===========================================================================

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    331 ::1/128                  On-link
  7    271 fe80::/64                On-link
  7    271 fe80::d5bc:87b9:f973:a489/128
                                    On-link
  1    331 ff00::/8                 On-link
  7    271 ff00::/8                 On-link
===========================================================================
Persistent Routes:
mattmapadmi commented 3 years ago

I'm seeing this too but on a T3A.medium AMI running on a C5A.xlarge instance. I can't rule out that just rebuilding the image fixed it, but I started again on a C5A.xlarge and it runs just fine on a C5A.xlarge instance.

Thor-Bjorgvinsson commented 3 years ago

Thanks for reporting this issue, we are working on reproducing this locally. Can you confirm that the metadata service is available on the ec2 instance where the agent is showing these errors?

Invoke-WebRequest -Uri http://169.254.169.254/latest/meta-data/instance-id -UseBasicParsing
Thor-Bjorgvinsson commented 3 years ago

I was able to reproduce the issue with the following steps:

  1. Start a Windows ec2 instance in a particular subnet
  2. Create a snapshot of that instance
  3. Create a AMI from the snapshot
  4. Create a new ec2 instance from the AMI in another subnet.

This results in the EC2 metadata service not being available.

PS C:\Users\Administrator> Invoke-WebRequest -Uri http://169.254.169.254/latest/meta-data/instance-id -UseBasicParsing
Invoke-WebRequest : Unable to connect to the remote server
At line:1 char:1
+ Invoke-WebRequest -Uri http://169.254.169.254/latest/meta-data/instan ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc
   eption
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

I've cut a ticket in our internal ticketing system to the relevant teams.

Thor-Bjorgvinsson commented 3 years ago

I've received feedback that this is expected behavior and that the instance needs to be reconfigured. Please take a look at these docs to get an idea on how to resolve your issue:

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2launch.html#ec2launch-config https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/Creating_EBSbacked_WinAMI.html#update-metadata-KMS

foxrj21 commented 1 year ago

Import-Module C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Ec2Launch.psm1; Add-Route