ansible-collections / community.aws

Ansible Collection for Community AWS
GNU General Public License v3.0
186 stars 396 forks source link

community.aws.aws_ssm connection does not consider the variable ansible_aws_ssm_profile, maybe others #1725

Open aworldofcode opened 1 year ago

aworldofcode commented 1 year ago

Summary

the variable: ansible_aws_ssm_profile is not taking effect when used. The only workaround I found is to use the export AWS_PROFILE=[profile name] in bash

Issue Type

Bug Report

Component Name

community.aws.aws_ssm connection

Ansible Version

$ ansible --version
ansible [core 2.14.2]
  config file = ~/gitlab/ansible-cda-tools/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/Library/Python/3.9/lib/python/site-packages/ansible
  ansible collection location = ~/gitlab/ansible-cda-tools/collections
  executable location = ~/Library/Python/3.9/bin/ansible
  python version = 3.9.16 (main, Dec  7 2022, 10:16:11) [Clang 14.0.0 (clang-1400.0.29.202)] (/usr/local/opt/python@3.9/bin/python3.9)
  jinja version = 3.1.2
  libyaml = True

Collection Versions

$ ansible-galaxy collection list
# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection               Version
------------------------ -------
amazon.aws               1.1.0
ansible.netcommon        1.1.2
community.akamai         1.0.0
community.aws            1.1.0
community.crypto         2.1.0
community.crypto_entrust 1.1.3
community.docker         1.9.0
community.general        0.1.1
community.keystore       1.0.0
community.mysql          1.0.0
community.mysql          1.0.3
f5networks.f5_modules    1.5.0
servicenow.servicenow    1.0.6

# ~Library/Python/3.9/lib/python/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    5.2.0
ansible.netcommon             4.1.0
ansible.posix                 1.5.1
ansible.utils                 2.9.0
ansible.windows               1.13.0
arista.eos                    6.0.0
awx.awx                       21.11.0
azure.azcollection            1.14.0
check_point.mgmt              4.0.0
chocolatey.chocolatey         1.4.0
cisco.aci                     2.3.0
cisco.asa                     4.0.0
cisco.dnac                    6.6.3
cisco.intersight              1.0.23
cisco.ios                     4.3.1
cisco.iosxr                   4.1.0
cisco.ise                     2.5.12
cisco.meraki                  2.15.0
cisco.mso                     2.2.1
cisco.nso                     1.0.3
cisco.nxos                    4.0.1
cisco.ucs                     1.8.0
cloud.common                  2.1.2
cloudscale_ch.cloud           2.2.4
community.aws                 5.2.0
community.azure               2.0.0
community.ciscosmb            1.0.5
community.crypto              2.10.0
community.digitalocean        1.23.0
community.dns                 2.5.0
community.docker              3.4.0
community.fortios             1.0.0
community.general             6.3.0
community.google              1.0.0
community.grafana             1.5.3
community.hashi_vault         4.1.0
community.hrobot              1.7.0
community.libvirt             1.2.0
community.mongodb             1.4.2
community.mysql               3.5.1
community.network             5.0.0
community.okd                 2.2.0
community.postgresql          2.3.2
community.proxysql            1.5.1
community.rabbitmq            1.2.3
community.routeros            2.7.0
community.sap                 1.0.0
community.sap_libs            1.4.0
community.skydive             1.0.0
community.sops                1.6.0
community.vmware              3.3.0
community.windows             1.12.0
community.zabbix              1.9.1
containers.podman             1.10.1
cyberark.conjur               1.2.0
cyberark.pas                  1.0.17
dellemc.enterprise_sonic      2.0.0
dellemc.openmanage            6.3.0
dellemc.os10                  1.1.1
dellemc.os6                   1.0.7
dellemc.os9                   1.0.4
dellemc.powerflex             1.5.0
dellemc.unity                 1.5.0
f5networks.f5_modules         1.22.0
fortinet.fortimanager         2.1.7
fortinet.fortios              2.2.2
frr.frr                       2.0.0
gluster.gluster               1.0.2
google.cloud                  1.1.2
grafana.grafana               1.1.0
hetzner.hcloud                1.9.1
hpe.nimble                    1.1.4
ibm.qradar                    2.1.0
ibm.spectrum_virtualize       1.11.0
infinidat.infinibox           1.3.12
infoblox.nios_modules         1.4.1
inspur.ispim                  1.2.0
inspur.sm                     2.3.0
junipernetworks.junos         4.1.0
kubernetes.core               2.3.2
lowlydba.sqlserver            1.3.1
mellanox.onyx                 1.0.0
netapp.aws                    21.7.0
netapp.azure                  21.10.0
netapp.cloudmanager           21.22.0
netapp.elementsw              21.7.0
netapp.ontap                  22.2.0
netapp.storagegrid            21.11.1
netapp.um_info                21.8.0
netapp_eseries.santricity     1.4.0
netbox.netbox                 3.10.0
ngine_io.cloudstack           2.3.0
ngine_io.exoscale             1.0.0
ngine_io.vultr                1.1.3
openstack.cloud               1.10.0
openvswitch.openvswitch       2.1.0
ovirt.ovirt                   2.4.1
purestorage.flasharray        1.16.2
purestorage.flashblade        1.10.0
purestorage.fusion            1.3.0
sensu.sensu_go                1.13.2
splunk.es                     2.1.0
t_systems_mms.icinga_director 1.32.0
theforeman.foreman            3.8.0
vmware.vmware_rest            2.2.0
vultr.cloud                   1.7.0
vyos.vyos                     4.0.0
wti.remote                    1.0.4

AWS SDK versions

$ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: mitch@garnaat.com
License: MIT
Location: ~/Library/Python/3.9/lib/python/site-packages
Requires:
Required-by:
---
WARNING: Package(s) not found: boto
Name: boto3
Version: 1.26.61
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: ~/Library/Python/3.9/lib/python/site-packages
Requires: botocore, s3transfer, jmespath
Required-by:
---
Name: botocore
Version: 1.29.61
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: ~/Library/Python/3.9/lib/python/site-packages
Requires: jmespath, urllib3, python-dateutil
Required-by: s3transfer, boto3

Configuration

$ ansible-config dump --only-changed
CACHE_PLUGIN(~/gitlab/ansible-cda-tools/ansible.cfg) = jsonfile
CACHE_PLUGIN_CONNECTION(~/gitlab/ansible-cda-tools/ansible.cfg) = /tmp/facts_cache
CACHE_PLUGIN_TIMEOUT(~/gitlab/ansible-cda-tools/ansible.cfg) = 10
CALLBACKS_ENABLED(~/gitlab/ansible-cda-tools/ansible.cfg) = ['profile_tasks']
COLLECTIONS_PATHS(~/gitlab/ansible-cda-tools/ansible.cfg) = ['~/gitlab/ansible-cda-tools/collections']
CONFIG_FILE() = ~/gitlab/ansible-cda-tools/ansible.cfg
DEFAULT_DEBUG(~/gitlab/ansible-cda-tools/ansible.cfg) = False
DEFAULT_GATHERING(~/gitlab/ansible-cda-tools/ansible.cfg) = smart
DEFAULT_HOST_LIST(~/gitlab/ansible-cda-tools/ansible.cfg) = ['~/gitlab/ansible-cda-tools/inventory']
DEFAULT_LOG_PATH(~/gitlab/ansible-cda-tools/ansible.cfg) = ~/gitlab/ansible-cda-tools/ansible.log
DEFAULT_ROLES_PATH(~/gitlab/ansible-cda-tools/ansible.cfg) = ['~/gitlab/ansible-cda-tools/roles']
HOST_KEY_CHECKING(~/gitlab/ansible-cda-tools/ansible.cfg) = False

OS / Environment

MacOS Ventura 13.0.1

Steps to Reproduce


- name: Wait for connection to be available
  hosts: local
  connection: local
  gather_facts: false
  vars:
    ansible_connection: aws_ssm

    ansible_aws_ssm_region: us-east-1
    ansible_aws_ssm_profile: commerce1
    ansible_aws_ssm_instance_id: i-xxxxx

    ansible_aws_ssm_bucket_name: [hidden]
    ansible_aws_ssm_s3_addressing_style: virtual
  tasks:
    - name: Wait for connection
      wait_for_connection:
    - name: aws-cli
      raw: which nano
    - name: ping
      ping:

Expected Results

I expect to be able to connect to the ec2 instance in the aws account of the profile that is in my .aws/config And run the tasks

for now only works with workaround of declaring the aws profile in bash cli with export AWS_PROFILE=commerce1

TASK [Wait for connection] ********************************************************************************************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_test_playbook.yml:36
Friday 24 February 2023  18:27:38 -0500 (0:00:00.032)       0:00:00.032 *******
redirecting (type: connection) ansible.builtin.aws_ssm to community.aws.aws_ssm
wait_for_connection: attempting ping module test
[WARNING]: Reset is not implemented for this connection
<localhost> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<localhost> SSM CONNECTION ID: botocore-session-[hiddne]
<localhost> EXEC echo ~

Actual Results


PLAYBOOK: ssm_connection_test_playbook.yml *********************************************************************************************************************************************************************
1 plays in PlayBooks/ansible_ssm_connection/ssm_connection_test_playbook.yml

PLAY [Wait for connection to be available] *************************************

TASK [Wait for connection] *************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_test_playbook.yml:36
Friday 24 February 2023  18:29:44 -0500 (0:00:00.037)       0:00:00.037 ******* 
redirecting (type: connection) ansible.builtin.aws_ssm to community.aws.aws_ssm
wait_for_connection: attempting ping module test
[WARNING]: Reset is not implemented for this connection
<localhost> ESTABLISH SSM CONNECTION TO: i-xxxxx
<localhost> ssm_retry: attempt: 0, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxx is not connected.) from cmd (echo ~...), pausing for 0 seconds
<localhost> ESTABLISH SSM CONNECTION TO: i-xxxxx
<localhost> ssm_retry: attempt: 1, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxx is not connected.) from cmd (echo ~...), pausing for 1 seconds

Code of Conduct

aworldofcode commented 1 year ago

Also when using the workaround and when executing a task, it just stalls, I suspect because it is not considering the vars like ansible_aws_ssm_bucket_name

tremble commented 1 year ago

ansible-galaxy collection list is showing 2 copies of the collection installed (1.1.0 and 5.2.0)

The warning that you're seeing:

[WARNING]: Reset is not implemented for this connection

Makes it look like it's picking the old version, I suspect that's what's causing the problem. Please could you try uninstalling the other copy.

aworldofcode commented 1 year ago

I noticed that and updated the collection but still same issue


ansible-galaxy collection list amazon.aws
# ~/Library/Python/3.9/lib/python/site-packages/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0  

# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0  
tremble commented 1 year ago

@aworldofcode,

I'm unable to reproduce the issue you're seeing.

While writing the tests what I did notice is that when I forgot to pass any credentials at all, the error I got was Unable to locate credentials and not TargetNotConnected.

While writing the initial integration tests most of the time when I was seeing TargetNotConnected it was because the EC2 Instance wasn't talking properly to SSM, rather than permissions problems on the controller end. Either because our "cleanup" lambda had already nuked the Instance or because the InstanceProfile wasn't properly configured.

The AWS documentation also doesn't make it look like a credentials issue https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-troubleshooting.html#ssh-target-not-connected

Solution A: This error is returned when the specified target managed node for the session isn't fully configured for use with Session Manager. For information, see Setting up Session Manager. Solution B: This error is also returned if you attempt to start a session on a managed node that is located in a different AWS account or AWS Region.

What errors are you seeing at the END of the wait_for_connection task? Since you've only included the first few errors I can only guess that possibly you didn't wait long enough for the instance to be ready, and that by the time you tried with AWS_PROFILE it had finished booting. (To me, TargetNotConnected actually implies that a connection was successfully initiated to the SSM APIs, which in turn would mean that the variable was honoured)

aworldofcode commented 1 year ago

@tremble There are no more logs after

<localhost> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<localhost> ssm_retry: attempt: 1, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxxxxx is not connected.) from cmd (echo ~...), pausing for 1 seconds

The playbook ends after that. The only workaround is to

export AWS_PROFILE=[the profile]

The vars ansible_aws_ssm_profile is ignored

aworldofcode commented 1 year ago

@tremble The target is fully configured. The way I test it is to run from cli the aws ssm start-session comand with --profile option and it immediately responds correctly as I am in the ssm shell

tremble commented 1 year ago

@tremble There are no more logs after

That's really strange, I'd expect at least some more repeats of that error message.

aworldofcode commented 1 year ago

Let me try again

aworldofcode commented 1 year ago

Here is the update

ansible-playbook [core 2.14.2]
  config file = ~/gitlab/ansible-cda-tools/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/Library/Python/3.9/lib/python/site-packages/ansible
  ansible collection location = ~/gitlab/ansible-cda-tools/collections
  executable location = ~/Library/Python/3.9/bin/ansible-playbook
  python version = 3.9.16 (main, Dec  7 2022, 10:16:11) [Clang 14.0.0 (clang-1400.0.29.202)] (/usr/local/opt/python@3.9/bin/python3.9)
  jinja version = 3.1.2
  libyaml = True
Using ~/gitlab/ansible-cda-tools/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
script declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
Loading collection amazon.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/amazon/aws
Using inventory plugin 'ansible_collections.amazon.aws.plugins.inventory.aws_ec2' to process inventory source '~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml'
Parsed ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml inventory source with auto plugin
Loading callback plugin default of type stdout, v2.0 from ~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/callback/default.py
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: ssm_connection_playbook.yml *********************************************************************************************************************************************************************************************************************************************************
Positional arguments: PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml',)
forks: 1
1 plays in PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml

PLAY [Wait for connection to be available] ****************************************************************************************************************************************************************************************************************************************************

TASK [Ping] ***********************************************************************************************************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml:44
Loading collection community.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 0, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 0 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 1, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 1 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 2, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 3 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
The full traceback is:
Traceback (most recent call last):
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/executor/task_executor.py", line 158, in run
    res = self._execute()
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/executor/task_executor.py", line 629, in _execute
    result = self._handler.run(task_vars=vars_copy)
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/normal.py", line 47, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 1040, in _execute_module
    self._make_tmp_path()
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 457, in _make_tmp_path
    tmpdir = self._remote_expand_user(self.get_shell_option('remote_tmp', default='~/.ansible/tmp'), sudoable=False)
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 923, in _remote_expand_user
    data = self._low_level_execute_command(cmd, sudoable=False)
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 1320, in _low_level_execute_command
    rc, stdout, stderr = self._connection.exec_command(cmd, in_data=in_data, sudoable=sudoable)
  File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 197, in wrapped
    return_tuple = func(self, *args, **kwargs)
  File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 337, in exec_command
    super(Connection, self).exec_command(cmd, in_data=in_data, sudoable=sudoable)
  File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/connection/__init__.py", line 35, in wrapped
    self._connect()
  File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 271, in _connect
    self.start_session()
  File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 295, in start_session
    response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
  File "~/Library/Python/3.9/lib/python/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "~/Library/Python/3.9/lib/python/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.TargetNotConnected: An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.
fatal: [wdw-ecommerce-certmgmtui-use1-stage-ansible-rhel7]: FAILED! => {
    "msg": "Unexpected failure during module execution: An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.",
    "stdout": ""
}

PLAY RECAP ************************************************************************************************************************************************************************************************************************************************************************************
wdw-ecommerce-certmgmtui-use1-stage-ansible-rhel7 : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
tremble commented 1 year ago

That traceback shows that you're still using community.aws 1.1.0 not 5.2.0:

File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 295, in start_session response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)

This is 1.1.0: https://github.com/ansible-collections/community.aws/blob/1.1.0/plugins/connection/aws_ssm.py#L295

response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)

This is 5.2.0: https://github.com/ansible-collections/community.aws/blob/5.2.0/plugins/connection/aws_ssm.py#L295

msg = f"ssm_retry: attempt: {attempt}, caught exception({e}) from cmd ({cmd_summary}), pausing for {pause} seconds"
aworldofcode commented 1 year ago

How is this possible ?

 ansible-galaxy collection list amazon.aws

# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0

This is the only collection present on my system

tremble commented 1 year ago

That's amazon.aws. This collection is community.aws. For various reasons we split things into

amazon.aws: Supported by the Ansible Cloud team (A team in Red Hat paid to support Ansible 'cloud' modules) community.aws: Supported by the "community" (which often means a couple of 'usual suspects), however, the community are not generally paid to work on Ansible.)

Generally we recommend amazon.aws and community.aws being kept on the same major version. However it's (theoretically) possible to have amazon.aws at a higher major version than community.aws

aworldofcode commented 1 year ago

ah ! good catch

 ansible-galaxy collection list community.aws

# ~/Library/Python/3.9/lib/python/site-packages/ansible_collections
Collection    Version
------------- -------
community.aws 5.2.0

# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection    Version
------------- -------
community.aws 5.2.0

echo $AWS_PROFILE
$

Rerunnig

aworldofcode commented 1 year ago

so we are moving forward but then it seems to stall

C02YL0VEJGH8:ansible-cda-tools palea009$ clear ; ansible-playbook PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml  -i inventory/aws_ec2.yaml^C-vvvv
C02YL0VEJGH8:ansible-cda-tools palea009$ echo $OBJC_DISABLE_INITIALIZE_FORK_SAFETY

C02YL0VEJGH8:ansible-cda-tools palea009$ clear
C02YL0VEJGH8:ansible-cda-tools palea009$ clear ; ansible-playbook PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml  -i inventory/aws_ec2.yaml  -vvvv
ansible-playbook [core 2.14.2]
  config file = ~/gitlab/ansible-cda-tools/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/Library/Python/3.9/lib/python/site-packages/ansible
  ansible collection location = ~/gitlab/ansible-cda-tools/collections
  executable location = ~/Library/Python/3.9/bin/ansible-playbook
  python version = 3.9.16 (main, Dec  7 2022, 10:16:11) [Clang 14.0.0 (clang-1400.0.29.202)] (/usr/local/opt/python@3.9/bin/python3.9)
  jinja version = 3.1.2
  libyaml = True
Using ~/gitlab/ansible-cda-tools/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
script declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
Loading collection amazon.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/amazon/aws
Using inventory plugin 'ansible_collections.amazon.aws.plugins.inventory.aws_ec2' to process inventory source '~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml'
Parsed ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml inventory source with auto plugin
Loading callback plugin default of type stdout, v2.0 from ~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/callback/default.py
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: ssm_connection_playbook.yml *********************************************************************************************************************************************************************************************************************************************************
Positional arguments: PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml',)
forks: 5
1 plays in PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml

PLAY [Wait for connection to be available] ****************************************************************************************************************************************************************************************************************************************************

TASK [Ping] ***********************************************************************************************************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml:44
Loading collection community.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> INITIALIZE BOTO3 CLIENTS
<i-xxxxxx> SETUP BOTO3 CLIENTS: SSM
<i-xxxxxx> _get_bucket_endpoint: S3 (global)
<i-xxxxxx> _get_bucket_endpoint: S3 (bucket region) - None
<i-xxxxxx> SETUP BOTO3 CLIENTS: S3 https://s3.amazonaws.com
<i-xxxxxx> START SSM SESSION: i-xxxxxx
<i-xxxxxx> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "botocore-session-xxxxx", "TokenValue": "xxxx", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/botocore-session-xxxxx?role=publish_subscribe&cell-number=xxxx", "ResponseMetadata": {"RequestId": "xxxx", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Mon, 27 Feb 2023 15:49:25 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "971", "connection": "keep-alive", "x-amzn-requestid": "xxxx"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', 'commerce1', '{"Target": "i-xxxxxx"}', 'https://ssm.us-east-1.amazonaws.com']
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\n'
<i-xxxxxx> PRE startup output received
<i-xxxxxx> PRE Disabling Echo: b'stty -echo\n'
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\nThis session is encrypted using AWS KMS.\r\n'
<i-xxxxxx> PRE remaining: 59
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\nThis session is encrypted using AWS KMS.\r\n\x1b[?1034hsh-4.2$ '
<i-xxxxxx> PRE remaining: 58
<i-xxxxxx> PRE remaining: 57
<i-xxxxxx> PRE remaining: 56
<i-xxxxxx> PRE remaining: 55
<i-xxxxxx> PRE remaining: 54
<i-xxxxxx> PRE remaining: 53
<i-xxxxxx> PRE remaining: 52
<i-xxxxxx> PRE remaining: 51
<i-xxxxxx> PRE remaining: 50
<i-xxxxxx> PRE remaining: 49
<i-xxxxxx> PRE remaining: 48
<i-xxxxxx> PRE remaining: 47
....
<i-xxxxxx> PRE remaining: 1
<i-xxxxxx> PRE timeout stdout:
tremble commented 1 year ago

Ok, progress :)

This session is encrypted using AWS KMS.

So what you're seeing now is https://github.com/ansible-collections/community.aws/issues/684

For what it's worth, I do want to get support for KMS encrypted sessions working (I'd need it if we wanted to use this in $dayjob). However, after dealing with a pile of issues for 5.1.0 / 5.2.0 I wanted a break...

aworldofcode commented 1 year ago

So that means that currently I wont be able to move forward with our playbooks ?

aworldofcode commented 1 year ago

@tremble How do I deploy the collection with those merges ?

aworldofcode commented 1 year ago

@tremble More updates I have tested connecting to various EC2 instances witha certain degree of success. It appears that the issue is limited , so far , to instances with an older amaazon ssm-agent. The 3.2+ agents work. The 2.x series just stall at the PRE remaining: xx portion of the taks after akn of the ssm start-session

tremble commented 1 year ago

So that means that currently I wont be able to move forward with our playbooks ?

It means that at this time there is a known issue when you use KMS encrypted SSM sessions. Dealing with this module is not part of $DayJob and I do not have the time/energy to try and dig too much into the problem at this time. This doesn't block anyone else from doing so.

@tremble How do I deploy the collection with those merges ?

If you're referring to the PRs attached to this issue. They're only additional tests, there's no change to the plugin. (That was part of me trying to reproduce your issue).

It appears that the issue is limited , so far , to instances with an older amazon ssm-agent.

That's an interesting data point. If you're able to work around the problem by updating the agent, then I'd strongly recommend doing so. As far as I can tell, the last 2.x release was in 2020. I'd be surprised if that didn't include security vulnerabilities.

aworldofcode commented 1 year ago

@tremble When we see in logs the following

<i-xxx> EXEC remaining: 60
<i-xxx> EXEC remaining: 59
<i-xxx> EXEC remaining: 58
...
<i-xxx> EXEC remaining: 52

Does this mean that the executions is taking time to return the task or does it mean that the task is stalled ?

aworldofcode commented 1 year ago

I seem to have more success with older ssm_agents when I up the time out

aworldofcode commented 1 year ago

@tremble so it looks like

<i-xxx> EXEC remaining: 

is execution time, and those seem to succeed The ones that seem to time out are

PRE remaining: 299

What does PRE indicate ?

tremble commented 1 year ago

PRE - "Prepare the Terminal" - https://github.com/ansible-collections/community.aws/blob/main/plugins/connection/aws_ssm.py#L603

A session has a number of phases:

If it's getting stuck with "PRE" output, then it's stuck somewhere in the "prepare the terminal" phase...

aworldofcode commented 1 year ago

@tremble So far I am noticing consistency where with ssm_agent 2.1.xx the session is stuck in "PRE" output.

aworldofcode commented 1 year ago

@tremble I can confirm that so far my experience is that this bug is only present when amazon ssm-agent is version 2.x , I updated the 2.x to 3.x and I am no longer experiencing the PRE timeout issue.