aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.03k stars 323 forks source link

SSM Agent Update can cause SSM Commands to get "lost" #430

Closed BEllis closed 2 years ago

BEllis commented 2 years ago

If an SSM Command is issued while the SSM Agent is being upgraded, it will be marked as "In Progress" but will never Complete.

This is an issue for us in that we have instances that are stopped for long periods of time, when they start they receive an update command to update the SSM Agent, but the agent reports back as "Online" so our scripts also issue SSM commands to be run but these randomly get lost if they are received while the agent is updating. There is an error in the SSM log about not being able to send the response due to the queue being closed.

VishnuKarthikRavindran commented 2 years ago

Hi @BEllis, Thanks for reaching us. Are we seeing this in-progress command being picked up after the update? There should be a log line which says "Processing in-progress document - {CommandId}". Could you please provide few more logs which were logged during this period in amazon-ssm-agent.log and error.log file?

BEllis commented 2 years ago

I'll take a look next time it happens, but from my experience they don't get picked up after the update.

Thor-Bjorgvinsson commented 2 years ago

There was a queue issue resolved in version 3.1.127.0, what versions of the agent are you running before the update?

VishnuKarthikRavindran commented 2 years ago

Feel free to reopen when the issue pops up again. We have few fixes in the latest version for this type of issue. Thanks