aws-samples / pcluster-manager

Manage AWS ParallelCluster through an easy to use web interface
https://pcluster.cloud
Apache License 2.0
65 stars 27 forks source link

PCM deployment is overriding default behavior of SSM Sessions #325

Open demartinofra opened 1 year ago

demartinofra commented 1 year ago

Hi,

When deploying pcluster-manger the following substack performs an update of the SSM-SessionManagerRunShell document by overriding the default document with an hardcoded version. As documented here, SSM-SessionManagerRunShell document controls the default SSM sessions settings for the account at the region level.

I have the following concerns:

  1. The default SSM session settings are overwritten with a static default
  2. The updated document changes the default SSM user for all nodes where /opt/parallelcluster directory is found. Users expect the default ssm-user to be used while they will automatically land on the cluster nodes as the default cluster user. Also if this customization is triggered on arbitrary nodes where for some reason /opt/parallelcluster dir is present, the execution will just fail.
  3. The command in the doc relies on some internal pcluster variables that might be changed at some point with the risk to break ssm session access.
  4. [minor] The change is even persisted when deleting pcluster-manager stack

Can you share details on why this is necessary and if this configuration can be done at a more scoped level?

Cheers, Francesco

sean-smith commented 1 year ago

A few notes here:

I wish there was a better way to do this - maybe raising a feature request with the SSM team to get either the user as a parameter or the document but for now this is the best way to ensure the correct user is set.

demartinofra commented 1 year ago

Thanks Sean! What is driving the need of having to switch user before the session is started vs starting the session as the default ssm-user and then performing the switch user?

joshvmaws commented 1 year ago

AWS has landing zone accelerators that also creates/modifies the default SSM-SessionManagerRunShell document to enable encryption, centralized logging and few others according to security best practices. Also they have SCPs to deny access to that document. If we deploy pcluster managers in those environments, the stack creation fails because the lambda that modifies the document fails to execute. If the SCP is removed the lambda updates the document overriding all those security best practices.

I have a couple ideas that I think might work:

  1. I think it would be better to create a whole new session document for pcluster-manager purposes and enforce users to use that when connecting through ssm
  2. Update the lambda to read the document content first and append linuxcmd parameters to that.
  3. Or the same linuxcmd can be appended to /etc/profile so it gets executed on any shell logins. Or any other similar approach (init scripts) would work.
sean-smith commented 1 year ago

@joshvmaws Is there anyway to point to a specific document when connecting? The link we're using is:

https://[region].console.aws.amazon.com/systems-manager/session-manager/[instance_id]?region=[region]

When we implemented this (Nov 2021) there wasn't a way to select a specific SSM document when connecting. Maybe this has changed?

joshvmaws commented 1 year ago

In CLI there is a parameter you can pass --document-name. There must be a corresponding parameter for the web session that I haven't found yet.