aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.04k stars 323 forks source link

SSM Agent isn't using ShareProfile #368

Closed barryku closed 3 years ago

barryku commented 3 years ago

We have both EC2 and non-EC2 instances, so we use hybrid mode for both to be consistent. Our EC2 instances also have instance profile attached to access resources like S3. We found issues with those EC2 instances because the /root/.aws/credentials/[default] was taking precedence over instance profile's.

After setting ShareProfile, SSM Agent is creating/updating the profile credentials in /root/.aws/credentials correctly. However, it is not getting credentials from there, but from instance profile's credentials instead. The instance profile doesn't have any SSM permissions, hence AWS-RunPatchBaseline we are running with SSM is all failing with permission denied.

How can I make SSM Agent using credentials that don't interfere with instance profile's? Am I missing anything other than setting ShareProfile in /etc/amazon/ssm/amazon-ssm-agent.json? Here's what in our /etc/amazon/ssm/amazon-ssm-agent.json.

{
    "Profile":{
        "ShareProfile" : "ams-ssm"

    }
}
Thor-Bjorgvinsson commented 3 years ago

Hey @barryku, The configuration looks correct. Are you only experiencing this issue with AWS-RunPatchBaseline? Could you show an example log (and relevant stack trace if applicable) for the permission denied?

What agent version are you running?

Are you able to execute a simple AWS-RunShellScript with a echo "hello" command and upload the output to s3? If no, could you post upload relevant agent logs for this?

barryku commented 3 years ago

We are running the latest SSM Agent, 3.0.882.0. I have attached an error log example at the end. SSM has been working for our hybrid setup using the default profile. We then found the default profile for root user is interfering with our other AWS operations relying on instance profile. That's why we added the ShareProfile to avoid the conflict.

I was able to run echo with AWS-RunSheelScript just fine since it doesn't access AWS resources. You remind me that maybe it's the SSM document we run that has issues. There's some credentials setup logic in AWS-RunPatchBaseline that looks suspicious. I would assume SSM document which needs AWS credentials for SSM operations should get it from SSM Agent, but it may not be the case for AWS-RunPatchBaseline.

Is there a way to verify which credentials SSM Agent is using? This will help to determine if it's SSM Agent related. Is that your team handling those AWS-xyz SSM documents? If not, who should I reach?

04/05/2021 16:13:50 root [INFO]: Attempting to import entrance file os_selector
04/05/2021 16:13:51 root [INFO]: Running with snapshot id = and operation = Scan
04/05/2021 16:13:51 root [ERROR]: An error occurred (AccessDeniedException) when calling the GetDeployablePatchSnapshotForInstance operation: User: arn:aws:sts::68xxxxxxx67:assumed-role/ManagedServicesBigBearInstance/i-0bbb0a0979a74f533 is not authorized to perform: ssm:GetDeployablePatchSnapshotForInstance on resource: arn:aws:ssm:us-east-1:68xxxxx7:*
Traceback (most recent call last):
File "/var/log/amazon/ssm/patch-baseline-operations/common_os_selector_methods.py", line 126, in _get_snapshot_info
patch_snapshot = _get_snapshot_with_client(ssm_client, instance_id, snapshot_id, baseline_override)
File "/var/log/amazon/ssm/patch-baseline-operations/common_os_selector_methods.py", line 414, in _get_snapshot_with_client
SnapshotId=snapshot_id
File "/var/log/amazon/ssm/patch-baseline-operations/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/log/amazon/ssm/patch-baseline-operations/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (AccessDeniedException) when calling the GetDeployablePatchSnapshotForInstance operation: User: arn:aws:sts::68xxxxxxx7:assumed-role/ManagedServicesBigBearInstance/i-0bbb0a0979a74f533 is not authorized to perform: ssm:GetDeployablePatchSnapshotForInstance on resource: arn:aws:ssm:us-east-1:68xxxxx67:*
ferkhat-aws commented 3 years ago

Hello. Since you are trying to use an EC2 instance as a hybrid machine, I would suggest you to not rely on attaching instance profiles to the EC2 instance. The normal procedure for a hybrid machines is to set access to various AWS resources through the IAM Role that you used when creating a hybrid activation. If you add S3 access to the IAM role that you used to create the activation (which should already include access to SSM), your "hybrid" machine will be able to properly use the SSM and S3 resources. (Note that you will need to reset the amazon-ssm-agent.json file to the default values.)

barryku commented 3 years ago

Thank you @ferkhat-aws and @Thor-Bjorgvinsson. I received help from AWS support, and was able to work around the issue with their guidance. The workaround is setting AWS_PROFILE env variable in SSM Agent, so processes invoked by it will be able to use that to resolve credentials. Although the workaround works for us, I would sill suggest this fixed in SSM Agent. It makes more sense to me that all automations invoked by SSM agent uses the same AWS credentials the agent does when interacting with SSM.

In case anyone runs into this, here's what needs to be done,

  1. sudo systemctl stop amazon-ssm-agent
  2. sudo systemctl edit amazon-ssm-agent (add the following that matches ShareProfile in /etc/amazon/ssm/amazon-ssm-agent.json)
    [Service]
    Environment="AWS_PROFILE=ams-ssm"
  3. sudo systemctl daemon-reload
  4. sudo systemctl start amazon-ssm-agent

Instead of systemctl edit, you can add that content to /etc/systemd/system/amazon-ssm-agent.service.d/override.conf.

Thor-Bjorgvinsson commented 3 years ago

Great, thank you for posting your solution so we can reference this in the future!