aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.05k stars 324 forks source link

ssmmessages-fips for 3.1.821.0 causes session manager to error #426

Closed michaelmagyar closed 2 years ago

michaelmagyar commented 2 years ago

We noticed that a federal clients' SSM agents stopped working with Session Manager about a week ago. After some testing, we narrowed it to the FIPS config value Mgs > Endpoint for 3.1.821.0.

If we leave this value blank or use ssmmessages.us-east-2.amazonaws.com, sessions connect properly. If we set this value to ssmmessages-fips.us-east-2.amazonaws.com (as instructed by AWS for FIPS compliance - and this has worked for almost a year), then we get an error saying that the instance is not connected.

Uninstalling 3.1.821.0 and installing 3.1.804.0 allows the FIPS endpoint for Mgs to work again.

I spent a few minutes looking at the version diff, but I could not find an obvious culprit.

yuting-fan commented 2 years ago

Hi michaelmagyar@,

Thank you for reporting the issue! Session Manager service team investigated the issue and found a bug on the service side. The issue impacted the FIPS experience in us-east-2 region, and did not have regressions otherwise. The issue is now fixed. Can you please try again and let us know if it works for you now?

Regarding "Uninstalling 3.1.821.0 and installing 3.1.804.0 allows the FIPS endpoint for Mgs to work again.", the issue we saw actually should impact all versions of SSM agents if FIPS endpoint is used. So version 3.1.821.0 should also have the same impact. Will you be able to confirm?

Cheers, Yuting

michaelmagyar commented 2 years ago

Hello @yuting-fan,

It appears that the issue is resolved and ssmmessages-fips.us-east-2.amazonaws.com is working again for us. Thank you for the quick response and for forwarding the issue to the Session Manager team!

It is odd, but 3.1.804.0 was working for all SSM features even through the service bug. In fact, we saw our systems slowly stop supporting Session Manager as they upgraded to 3.1.821.0, so it appears there was something tied to the version. Other SSM functions, including agent upgrade to 3.1.941.0, still worked. I'm guessing only ssmmessages was impacted and that other SSM features do not use it, but I'm not sure (and that still doesn't explain why only 3.1.821.0+ was impacted for us).

Oh well. It's fixed. Thank you!