Open 75lb opened 6 years ago
Hello, I had faced the same error logs.
Please try to delete ssm directory from /var/lib/amazon/ssm and restart start the ssm agent again.
In my case, SSM stored the figureprints of previous version this directory. i had to delete it and restart the agent then it registered the instance in managed instances and session manager was visible.
Thank you for posting the issue, is it possible to attach all agent logs? If the instance is not shown as managed instance, that indicates an issue with IAM role attached to it. Logs will help identify if the agent wasn't able to reach SSM service or not.
I don't have the instance or logs anymore, I'm afraid. I worked through this tutorial creating everything (including IAM roles) from scratch.. does the tutorial work correctly for you?
Yes I just followed the tutorial and found the instance in the managedInstances list.
Hi, I'm also seeing this error in the log. I am using the OpenVPN Appliance, which is Ubuntu 16.04 amd64 with OpenVPN installed.
I see the following in the log:
==> errors.log <==
2018-10-18 15:44:39 ERROR [Stop @ agent.go.100] Agent's core manager can't be nil
==> hibernate.log <==
2018-10-18 15:50:30 ERROR Health ping failed with error - UnrecognizedClientException: The security token included in the request is invalid.
status code: 400, request id: 040bcb0e-4bda-4e79-9756-3e731d414840
The instance is booted in eu-west-1
with an Instance profile which has AWS managed policy AmazonEC2RoleforSSM
attached. My Centos instances have the same instance role attached and are working fine.
I've verified I can curl the meta-data url from within the instance and that is working. I'm out of ideas :(
I'm using version 2.3.169.0-1
of the .deb
downloaded from the bucket linked from the tutorial.
Can this issue be reopened please? I notice it was closed with no resolution :(
2018-10-18 15:50:30 ERROR Health ping failed with error - UnrecognizedClientException: The security token included in the request is invalid. status code: 400, request id: 040bcb0e-4bda-4e79-9756-3e731d414840 This error indicates that the agent is not able to reach SSM service and in hibernation mode, can you please verify if the instance can reach ssm.eu-west-1.amazonaws.com?
Thanks for re-opening. If I curl that host, I see the following:
root@openvpnas2:~# curl -kv https://ssm.eu-west-1.amazonaws.com/
* Trying 52.94.217.30...
* Connected to ssm.eu-west-1.amazonaws.com (52.94.217.30) port 443 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 592 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_CBC_SHA1
* server certificate verification SKIPPED
* server certificate status verification SKIPPED
* common name: ssm.eu-west-1.amazonaws.com (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: CN=ssm.eu-west-1.amazonaws.com
* start date: Mon, 13 Aug 2018 00:00:00 GMT
* expire date: Tue, 13 Aug 2019 12:00:00 GMT
* issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
* compression: NULL
* ALPN, server did not agree to a protocol
> GET / HTTP/1.1
> Host: ssm.eu-west-1.amazonaws.com
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< x-amzn-RequestId: 44b9f250-070b-47d4-a677-32698e16273b
< Content-Length: 29
< Date: Wed, 24 Oct 2018 21:20:39 GMT
<
<UnknownOperationException/>
* Connection #0 to host ssm.eu-west-1.amazonaws.com left intact
Which all looks ok I guess? After bouncing the agent I get the following in amazon-ssm-agent.log
:
2018-10-18 16:33:03 INFO Entering SSM Agent hibernate - UnrecognizedClientException: The security token included in the request is invalid.
status code: 400, request id: 64616960-799b-49a6-a7fb-9b1000cde4f1
2018-10-24 21:20:47 INFO Got signal:terminated value:0xb91950
2018-10-24 21:20:47 INFO Stopping agent
2018-10-24 21:20:47 ERROR Agent's core manager can't be nil
2018-10-24 21:20:47 INFO Entering SSM Agent hibernate - NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Thanks for replying @nehalaws !
Ok, so... I got to the bottom of this :)
There was a sequence of events, which all had to happen in the correct order for this to become an issue. Its solved now though
However, it turns out that the instance was now actually referencing the now deleted role.
The fix was to remove the role from the instance, then attach the correct (newly created) one. Boot the instance and SSM started to work.
I spotted this because curling the meta-data
endpoint for the security-credentials
URI failed with:
% curl http://169.254.169.254/latest/meta-data/iam/security-credentials/basic-ec2
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>404 - Not Found</title>
</head>
<body>
<h1>404 - Not Found</h1>
</body>
</html>
However, once the new role was attached and the server booted that same url responds with:
% curl http://169.254.169.254/latest/meta-data/iam/security-credentials/basic-ec2/
{
"Code" : "Success",
"LastUpdated" : "2018-10-25T14:23:25Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "REDACTED",
"SecretAccessKey" : "REDACTED",
"Token" : "REDACTED",
"Expiration" : "2018-10-25T20:58:23Z"
I'm typing this all up in this issue in the hope it might help somebody else who encounters this bizarre sequence of events!
I am getting the "Entering SSM Agent hibernate - UnrecognizedClientException: The security token included in the request is invalid." error in my amazon-ssm-agent.log file. I've tried the fixes listed above and have not been able to fix it.
I am able to get data from http://169.254.169.254/latest/meta-data/iam/security-credentials/web-role and I can connect to https://ssm.us-west-2.amazonaws.com/
This is a Windows instance. Another instance on the same subnet uses the same role and is working fine with SSM.
For others trying to solve this. It took for me a combination of the above and then having to wait quite some time (30 minutes - 1 hour). After all that a final restart of the ssm service did it.
Hi.. I'm working through this tutorial, creating everything from scratch as described using the Amazon Linux 2 AMI but fail at step 3b as SSM is unable to find an instance with a correctly configured agent.
Taking a closer look at my instance, I'm seeing this in the logs. Is this a bug or am I missing something?