ansible-collections / community.aws

Ansible Collection for Community AWS
GNU General Public License v3.0
189 stars 398 forks source link

ansible_connection: aws_ssm fails when KMS encryption is enabled for SSM transport general prefs. #684

Open bedge opened 3 years ago

bedge commented 3 years ago

Summary

With the AWS systems manager preferences set with KMS encryption disabled, the:

    ansible_connection: aws_ssm

works

With KMS encryption enabled, it fails

Issue Type

Bug Report

Component Name

ec2_ssm

Ansible Version

ansible [core 2.11.3]
  config file = /Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg
  configured module search path = ['/Users/edgeb1/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages/ansible
  ansible collection location = /Users/edgeb1/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/edgeb1/.pyenv/versions/3.9.0/bin/ansible
  python version = 3.9.0 (default, Dec  9 2020, 10:07:40) [Clang 12.0.0 (clang-1200.0.32.27)]
  jinja version = 3.0.1
  libyaml = True

Collection Versions

➜ ansible-galaxy collection list

# /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    1.5.0
ansible.netcommon             2.2.0
ansible.posix                 1.2.0
ansible.utils                 2.3.0
ansible.windows               1.7.0
arista.eos                    2.2.0
awx.awx                       19.2.2
azure.azcollection            1.7.0
check_point.mgmt              2.0.0
chocolatey.chocolatey         1.1.0
cisco.aci                     2.0.0
cisco.asa                     2.0.2
cisco.intersight              1.0.15
cisco.ios                     2.3.0
cisco.iosxr                   2.3.0
cisco.meraki                  2.4.2
cisco.mso                     1.2.0
cisco.nso                     1.0.3
cisco.nxos                    2.4.0
cisco.ucs                     1.6.0
cloudscale_ch.cloud           2.2.0
community.aws                 1.5.0
community.azure               1.0.0
community.crypto              1.7.1
community.digitalocean        1.8.0
community.docker              1.8.0
community.fortios             1.0.0
community.general             3.4.0
community.google              1.0.0
community.grafana             1.2.1
community.hashi_vault         1.3.2
community.hrobot              1.1.1
community.kubernetes          1.2.1
community.kubevirt            1.0.0
community.libvirt             1.0.1
community.mongodb             1.2.1
community.mysql               2.1.0
community.network             3.0.0
community.okd                 1.1.2
community.postgresql          1.4.0
community.proxysql            1.0.0
community.rabbitmq            1.0.3
community.routeros            1.2.0
community.skydive             1.0.0
community.sops                1.1.0
community.vmware              1.12.0
community.windows             1.5.0
community.zabbix              1.4.0
containers.podman             1.6.1
cyberark.conjur               1.1.0
cyberark.pas                  1.0.7
dellemc.enterprise_sonic      1.1.0
dellemc.openmanage            3.5.0
dellemc.os10                  1.1.1
dellemc.os6                   1.0.7
dellemc.os9                   1.0.4
f5networks.f5_modules         1.10.1
fortinet.fortimanager         2.1.3
fortinet.fortios              2.1.2
frr.frr                       1.0.3
gluster.gluster               1.0.1
google.cloud                  1.0.2
hetzner.hcloud                1.4.4
hpe.nimble                    1.1.3
ibm.qradar                    1.0.3
infinidat.infinibox           1.2.4
inspur.sm                     1.2.0
junipernetworks.junos         2.3.0
kubernetes.core               1.2.1
mellanox.onyx                 1.0.0
netapp.aws                    21.6.0
netapp.azure                  21.8.1
netapp.cloudmanager           21.8.0
netapp.elementsw              21.6.1
netapp.ontap                  21.8.1
netapp.um_info                21.7.0
netapp_eseries.santricity     1.2.13
netbox.netbox                 3.1.1
ngine_io.cloudstack           2.1.0
ngine_io.exoscale             1.0.0
ngine_io.vultr                1.1.0
openstack.cloud               1.5.0
openvswitch.openvswitch       2.0.0
ovirt.ovirt                   1.5.3
purestorage.flasharray        1.9.0
purestorage.flashblade        1.6.0
sensu.sensu_go                1.11.1
servicenow.servicenow         1.0.6
splunk.es                     1.0.2
t_systems_mms.icinga_director 1.20.0
theforeman.foreman            2.1.2
vyos.vyos                     2.4.0
wti.remote                    1.0.1

# /Users/edgeb1/.ansible/collections/ansible_collections
Collection    Version
------------- -------
amazon.aws    1.4.1
community.aws 1.4.0

AWS SDK versions

➜ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: mitch@garnaat.com
License: MIT
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.18.14
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: jmespath, s3transfer, botocore
Required-by: navify-aws-sso-login, aws-ssm-copy
---
Name: botocore
Version: 1.21.14
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: jmespath, urllib3, python-dateutil
Required-by: s3transfer, boto3

Configuration

 ➜  ansible-config dump --only-changed

HOST_KEY_CHECKING(/Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg) = False
INVENTORY_ENABLED(/Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg) = ['aws_ec2']

OS / Environment

osx cataina: 10.15.7 (19H1323)

Steps to Reproduce

---
- name: Test command
  gather_facts: false
  hosts: all
  vars:
#    ansible_connection: ssh
    ansible_connection: aws_ssm
    ansible_aws_ssm_region: eu-central-1
    ansible_aws_ssm_bucket_name: nghc-sbox2-s3
    ansible_python_interpreter: /opt/venv/root/bin/python

  tasks:
    - name: test
      command:
        cmd: hostname

Expected Results

[I] ➜ ansible-playbook -i inventory_aws_ec2.yml --limit nghc-sbox2-bastion test.yml -v Using /Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg as config file

PLAY [Test command] **

TASK [test] ** changed: [nghc-sbox2-bastion] => {"changed": true, "cmd": ["hostname"], "delta": "0:00:00.002350", "end": "2021-08-11 16:29:45.231283", "rc": 0, "start": "2021-08-11 16:29:45.228 933", "stderr": "", "stderr_lines": [], "stdout": "nghc-sbox2-bastion", "stdout_lines": ["nghc-sbox2-bastion"]}

PLAY RECAP *** nghc-sbox2-bastion : ok=1 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Actual Results

<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line: Starting session with SessionId: bruce.edge@xxx.com-0b1f34f9beade7621
<i-0c208bc6d31fa6bf1> EXEC remaining: 60
<i-0c208bc6d31fa6bf1> EXEC remaining: 59
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line: SessionId: bruce.edge@xxx.com-0b1f34f9beade7621 :
<i-0c208bc6d31fa6bf1> EXEC stdout line: ----------ERROR-------
<i-0c208bc6d31fa6bf1> EXEC stdout line: Encountered error while initiating handshake. Fetching data key failed: Unable to retrieve data key, Error when decrypting data key Access
DeniedException: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.
<i-0c208bc6d31fa6bf1> EXEC stdout line:         status code: 400, request id: 58bbffdd-0094-48aa-93cd-be23a3b831ee
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> ssm_retry: attempt: 0, caught exception(local variable 'returncode' referenced before assignment) from cmd (echo ~...), pausing for 0 seconds
<i-0c208bc6d31fa6bf1> CLOSING SSM CONNECTION TO: i-0c208bc6d31fa6bf1
<i-0c208bc6d31fa6bf1> TERMINATE SSM SESSION: bruce.edge@xxx.com-0b1f34f9beade7621
<i-0c208bc6d31fa6bf1> ESTABLISH SSM CONNECTION TO: i-0c208bc6d31fa6bf1
<i-0c208bc6d31fa6bf1> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "bruce.edge@xxx.com-0a9f86cbef7a94279", "TokenValue": "AAEAARDh8M+i84KEitQgO7pZJfHRh
DXqcZRSggoX0JKknSdkAAAAAGET/1E7DBcbgdPSh4ResepBVh32nlZADVLLlyxsu/LuIjrrZ+5b+eYquv8dU3treK4QQfREd6gPaeU0hPSfRDsVTnz3CakOcLBOcku4oQ4glZE+pRIlhggAB+ozaJSp9rBlGSvDlGkRxeVuulP3HHseObp
BKMecV6GvPmtbqH9FLcXYALS0rqLPrEVpzHBWH9Tds2fzF1buQSTdTBQKRTchxSvEq/BKm0qdGU743Gpox5nXJ6eBVoZ67fH4hesI9LVG67av7oFZJrqpngKBctTeZKgcfi2X4XZDgKhMo9iHTlygf6mvgETDAUe09yVc/+Ww3R077bt/t
JNlKiBxfRbsY9w9rb9vycziX03SzLHFZDZUBAgWw66+jHp+0epTagTn44g=", "StreamUrl": "wss://ssmmessages.eu-central-1.amazonaws.com/v1/data-channel/bruce.edge@xxx.com-0a9f86cbef7a94279?ro
le=publish_subscribe", "ResponseMetadata": {"RequestId": "dd282e11-3b94-4ba6-81d3-ea1d5169fb95", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Wed, 11 Aug 2
021 16:48:17 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "651", "connection": "keep-alive", "x-amzn-requestid": "dd282e11-3b94-4ba6-81d3-ea1d5169fb95"},
 "RetryAttempts": 0}}', 'eu-central-1', 'StartSession', '', '{"Target": "i-0c208bc6d31fa6bf1"}', 'https://ssm.eu-central-1.amazonaws.com']
<i-0c208bc6d31fa6bf1> SSM CONNECTION ID: bruce.edge@xxx.com-0a9f86cbef7a94279
<i-0c208bc6d31fa6bf1> EXEC echo ~
<i-0c208bc6d31fa6bf1> _wrap_command: 'echo lHlPljXCIRJbmvvsKCJOQqdtWT

ssm log, /var/log/amazon/ssm/amazon-ssm-agent.log:

2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] Got job bruce.edge@xxx.com-0fd8c80d976e90ff4, starting worker
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] ssm-session-worker - v3.1.90.0
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [bruce.edge@xxx.com-0fd8c80d976e90ff4] channel: bruce.edge@xxx.com-0fd8c80d9
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] document: bruce.edge@xxx.com-0fd8c80d976e90ff4 worker started
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [bruce.edge@xxx.com-0fd8c80d976e90ff4] master listener started on path: /var
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] channel: bruce.edge@xxx.com-0fd8c80d976e90ff4 found
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [bruce.edge@xxx.com-0fd8c80d976e90ff4] inter process communication started a
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] inter process communication started at /var/lib/amazon/ssm/i-0c208bc6d31fa6bf1/channels/bruce
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] worker listener started on path: /var/lib/amazon/ssm/i-0c208bc6d31fa6bf1/channels/bruce.edge@
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] received plugin config message
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] {"DocumentInformation":{"DocumentID":"bruce.edge@xxx.com-0fd8c80d976e90ff4","Co
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] Running plugin Standard_Stream Standard_Stream
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Setting up datachannel for session: bruce.edge@xxx
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Opening websocket connection to: wss://ssmmessages
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Successfully opened websocket connection to: wss:/
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Starting websocket pinger
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Starting websocket listener
2021-08-10 21:39:15 INFO [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Initiating Handshake
2021-08-10 21:39:17 ERROR [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Fetching data key failed: Unable to retrieve data
    status code: 400, request id: 7814ad26-119b-4123-b077-65bb7f24cdfa
2021-08-10 21:39:17 ERROR [ssm-session-worker] [bruce.edge@xxx.com-0fd8c80d976e90ff4] [DataBackend] [pluginName=Standard_Stream] Encountered error while initiating handshake. Fet
    status code: 400, request id: 7814ad26-119b-4123-b077-65bb7f24cdfa

Both the ansible runner user and the instance role being connected to have full access to the KMS key:

# aws kms describe-key --key-id d71201a3-5c82-466d-aa8e-e7f9eef3696e
{
    "KeyMetadata": {
        "AWSAccountId": "xxxxxx",
        "KeyId": "d71201a3-5c82-466d-aa8e-e7f9eef3696e",
        "Arn": "arn:aws:kms:eu-central-1:580867092569:key/d71201a3-5c82......",
        "CreationDate": "2021-08-11T16:45:35.805000+00:00",
        "Enabled": true,
        "Description": "Manually created key for SSM encryption",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER",
        "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
        "EncryptionAlgorithms": [
            "SYMMETRIC_DEFAULT"
        ]
    }
}

Code of Conduct

ansibullbot commented 3 years ago

Files identified in the description: None

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

bedge commented 3 years ago

This is the config that works:

ss-2021-32-11_09 32 23

This does not:

Screen Shot 2021-08-11 at 10 11 05

bedge commented 3 years ago

Still fails after updating to latest boto components:

[I] ➜ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: mitch@garnaat.com
License: MIT
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.18.18
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: botocore, s3transfer, jmespath
Required-by: navify-aws-sso-login, aws-ssm-copy
---
Name: botocore
Version: 1.21.18
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: urllib3, python-dateutil, jmespath
Required-by: s3transfer, boto3
bedge commented 3 years ago

Just confirmed while replicating that the default shell needed to NOT be dash as well

Also I don't understand why the s3 bucket config needs to exist. If the instance doesn't have R/W permissions to the defined bucket it also fails, even though nothing has been written to the bucket:

---
- name: Test command
  gather_facts: false
  hosts: all
  vars:
    ansible_connection: aws_ssm
    ansible_aws_ssm_region: eu-central-1
    ansible_aws_ssm_bucket_name: nghc-sbox2-s3    <-------- Why is this needed ?
    ansible_python_interpreter: /opt/venv/root/bin/python

  tasks:
    - name: test
      command:
        cmd: hostname
markuman commented 3 years ago
ansible_aws_ssm_bucket_name: nghc-sbox2-s3    <-------- Why is this needed ?

I guess because ansible transfer it's plays to the bucket from where the aws ssm agent can download it.

bedge commented 3 years ago

Found this doc that could explain the KMS issue

https://aws.amazon.com/premiumsupport/knowledge-center/ssm-session-manager-failures/

If I get time I'll try this setup.

Still trying to sort out exactly what s3 permissions are needed.

simon97k commented 1 year ago

Got the same/simular Issue but my setup is a bit different:

I run the Ansible Playbook with credentials for a "login-account" and then Ansible itself assigns a role in the desired AWS target account by executing a assume role task on localhost and storing access, secret access key and session token at runtime in the reserved variables of the plugin (access_key_id, ...).

This works fine without KMS encrypted Session Manager, but when activated then this error occurs when running Ansible with -vvvvv:

Failed to process action KMSEncryption: Error calling KMS GenerateDataKey API: NotFoundException: Key 'arn:aws:kms:eu-central-1:[ACCOUNT-ID]:key/[KMS-Key-ID]' does not exist

The interesting part is that [ACCOUNT-ID] is the Account ID of the "login-account" while the [KMS-Key-ID] is from the correct target Account, but this combination is obviously not working.