Missouri-BMI / service-workbench-on-aws

A platform that provides researchers with one-click access to collaborative workspace environments operating across teams, universities, and datasets while enabling university IT stakeholders to manage, monitor, and control spending, apply security best practices, and comply with corporate governance.
Apache License 2.0
0 stars 0 forks source link

[Bug] workspace termination fails due to IAMRole drift related to SSM Patch Manager #15

Closed fergusonsha-umkc closed 2 years ago

fergusonsha-umkc commented 2 years ago

Summarized by Travis from our troubleshooting email chain:

... the AWS-managed policy is what has made a significant change that it will not delete the resource and therefore not delete the stack. The AWS-managed policy “AmazonSSMManagedInstanceCore” has been replaced with “AmazonSSMPatchAssociation”. This is probably due to something in Service Manager, possibly related to how your patching is set up. The Managed Instance Core policy is what allows you to connect to it via Session Manager, though. I have no idea why it might substitute for the Patch Association policy, which only allows for patching. But, that’s what’s going on. …just don’t know why.

fergusonsha-umkc commented 2 years ago

@sxinger this is created, to track and close when the "why" is figured out :)

fergusonsha-umkc commented 2 years ago

This is still with Travis.

sxinger commented 2 years ago

@fergusonsha-umkc are we actively following up with Travis on this issue?

fergusonsha-umkc commented 2 years ago

@sxinger my apologies, I got this issue mixed up with another issue related to role drift, and this is not one that was suggested would be fixed by upgrade. So, I'll find the email thread he left us at, and will check in with him on if they've made any progress on a solution.

fergusonsha-umkc commented 2 years ago

Last communication from Travis on this was Wednesday, December 8, 2021 2:38 PM:

"Ah ha! So, the AWS-managed policy is what has made a significant change that it will not delete the resource and therefore not delete the stack. The AWS-managed policy “AmazonSSMManagedInstanceCore” has been replaced with “AmazonSSMPatchAssociation”. This is probably due to something in Service Manager, possibly related to how your patching is set up. The Managed Instance Core policy is what allows you to connect to it via Session Manager, though. I have no idea why it might substitute for the Patch Association policy, which only allows for patching. But, that’s what’s going on. …just don’t know why."

I have followed up.

fergusonsha-umkc commented 2 years ago

His response: "Berkley, Travis travberk@amazon.com Thu 1/13/2022 4:39 PM

I have not been able to replicate this behavior in my environments. And, I still haven’t been able to figure out what is switching the polices.

Tell me a little bit about how you have Systems Manager set up for patching. Are you just using the default patch baselines, or did you build your own? Any modifications to the Inventory or Associations? I assume your workspaces are showing as compliant within Systems Manager.

We may have to create a workspace for testing, check it “frequently” for drift, then look in CloudTrail to see what is making the policy change. Once we figure out what is making it and when, we can backtrack and find out the reason why."

@sxinger or @aaronmbruce would one of you please respond to him regarding the bold-faced questions above? I don't recall being involved in the setup of SSM.

fergusonsha-umkc commented 2 years ago

Resolution is to update templates to have both of these policies attached at template launch so no stack drift occurs:

ManagedPolicyArns: ["arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore", "arn:aws:iam::aws:policy/AmazonSSMPatchAssociation"]

aaronmbruce commented 2 years ago

All templates have been updated with these new roles