aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.06k stars 324 forks source link

Amazon SSM agent fails to start #554

Closed tormath1 closed 10 months ago

tormath1 commented 10 months ago

Hello,

I am a maintainer of Flatcar container Linux, a Linux based OS. We upgraded Amazon SSM Agent from 2.3.1319.0 to 3.2.985.0 and we're noticing issues which impact Flatcar AWS users:

Initializing new seelog logger
New Seelog Logger Creation Complete
1704967520066534055 [Debug] Start File Watcher On: /etc/amazon/ssm/seelog.xml
1704967520066608958 [Debug] Start Watcher on directory: /etc/amazon/ssm
1704967520066663367 [Debug] [ssm-agent-worker] Current GoMaxProc value - 2
1704967520066714557 [Debug] [ssm-agent-worker] Checking if agent has OnPrem identity type
1704967520066728478 [Info] [ssm-agent-worker] Checking if agent identity type OnPrem can be assumed
1704967520066750635 [Warn] [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
1704967520066760431 [Debug] [ssm-agent-worker] Checking if agent has EC2 identity type
1704967520066765411 [Info] [ssm-agent-worker] Checking if agent identity type EC2 can be assumed
1704967520124509707 [Debug] [AuthRegisterService] Determining endpoint for service ssm in region us-west-2
1704967520124660050 [Debug] [EC2Identity] Determining endpoint for service ssm in region us-west-2
1704967520124684850 [Warn] [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
1704967520124695295 [Debug] [ssm-agent-worker] Checking if agent has CustomIdentity identity type
1704967520124701698 [Info] [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
1704967520124716273 [Warn] [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
1704967520124831803 [Error] [ssm-agent-worker] Agent failed to assume any identity
1704967520124845329 [Error] [ssm-agent-worker] failed to find identity, retrying: failed to find agent identity

The instance is started with a role having the following permission: AmazonSSMManagedInstanceCore and I even tried using the Fleet Manager: Default Host Management Configuration on this role.

Running the diagnostic tool, I see this:

$ sudo ssm-cli get-diagnostics --output table
┌──────────────────────────────────────┬─────────┬─────────────────────────────────────────────────────────────────────────┐
│ Check                                │ Status  │ Note                                                                    │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ EC2 IMDS                             │ Success │ IMDS is accessible and has instance id i-12345 in region    │
│                                      │         │ us-west-2                                                               │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Hybrid instance registration         │ Skipped │ Instance does not have hybrid registration                              │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to ssm endpoint         │ Success │ ssm.us-west-2.amazonaws.com is reachable                                │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to ec2messages endpoint │ Success │ ec2messages.us-west-2.amazonaws.com is reachable                        │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to ssmmessages endpoint │ Success │ ssmmessages.us-west-2.amazonaws.com is reachable                        │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to s3 endpoint          │ Success │ s3.us-west-2.amazonaws.com is reachable                                 │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to kms endpoint         │ Success │ kms.us-west-2.amazonaws.com is reachable                                │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to logs endpoint        │ Success │ logs.us-west-2.amazonaws.com is reachable                               │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Connectivity to monitoring endpoint  │ Success │ monitoring.us-west-2.amazonaws.com is reachable                         │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ AWS Credentials                      │ Success │ Credentials are for                                                     │
│                                      │         │ arn:aws:sts::12345... │
│                                      │         │ and will expire at 2024-01-11 11:10:10.87810707 +0000 UTC               │
│                                      │         │ m=+3749.157475872                                                       │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Agent service                        │ Failed  │ Agent is installed as a systemctl service but is not running            │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ Proxy configuration                  │ Skipped │ No proxy configuration detected                                         │
├──────────────────────────────────────┼─────────┼─────────────────────────────────────────────────────────────────────────┤
│ SSM Agent version                    │ Failed  │ Failed to get SSM Agent version: exit status 2                          │
└──────────────────────────────────────┴─────────┴─────────────────────────────────────────────────────────────────────────┘

I tried to get more logs without success and I am not sure if the following warning is somehow related:

 1704967520066750635 [Warn] [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory

Another information:

Any chance to get some information on what to do next for debugging?

EDIT: I tried with ubuntu with the same role/config, and it works as expected.

armnejad commented 10 months ago

Hello, Please be aware that SSM Agent does not claim to support Flatcar Linux. That said, I would recommend uninstalling your Agent and then reinstalling the desired version (rather than trying to perform an update). Updating from a much older version to a much newer version can be the cause of issues like the one you are seeing.

jepio commented 10 months ago

Hello, Please be aware that SSM Agent does not claim to support Flatcar Linux. That said, I would recommend uninstalling your Agent and then reinstalling the desired version (rather than trying to perform an update). Updating from a much older version to a much newer version can be the cause of issues like the one you are seeing.

When @tormath1 says "upgraded" he means that this was done during the AMI build process and not on a running instance. So the instance that fails in this way only ever came up with 3.2.985.0.

@armnejad are you able to share how/when /var/lib/amazon/ssm/ and /var/lib/amazon/ssm/runtimeconfig are populated?

tormath1 commented 10 months ago

Closing this, it has been solved on our side.