Open aworldofcode opened 1 year ago
Also when using the workaround and when executing a task, it just stalls, I suspect because it is not considering the vars like ansible_aws_ssm_bucket_name
ansible-galaxy collection list
is showing 2 copies of the collection installed (1.1.0 and 5.2.0)
The warning that you're seeing:
[WARNING]: Reset is not implemented for this connection
Makes it look like it's picking the old version, I suspect that's what's causing the problem. Please could you try uninstalling the other copy.
I noticed that and updated the collection but still same issue
ansible-galaxy collection list amazon.aws
# ~/Library/Python/3.9/lib/python/site-packages/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0
# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0
@aworldofcode,
I'm unable to reproduce the issue you're seeing.
While writing the tests what I did notice is that when I forgot to pass any credentials at all, the error I got was Unable to locate credentials
and not TargetNotConnected
.
While writing the initial integration tests most of the time when I was seeing TargetNotConnected it was because the EC2 Instance wasn't talking properly to SSM, rather than permissions problems on the controller end. Either because our "cleanup" lambda had already nuked the Instance or because the InstanceProfile wasn't properly configured.
The AWS documentation also doesn't make it look like a credentials issue https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-troubleshooting.html#ssh-target-not-connected
Solution A: This error is returned when the specified target managed node for the session isn't fully configured for use with Session Manager. For information, see Setting up Session Manager. Solution B: This error is also returned if you attempt to start a session on a managed node that is located in a different AWS account or AWS Region.
What errors are you seeing at the END of the wait_for_connection
task? Since you've only included the first few errors I can only guess that possibly you didn't wait long enough for the instance to be ready, and that by the time you tried with AWS_PROFILE
it had finished booting. (To me, TargetNotConnected
actually implies that a connection was successfully initiated to the SSM APIs, which in turn would mean that the variable was honoured)
@tremble There are no more logs after
<localhost> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<localhost> ssm_retry: attempt: 1, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxxxxx is not connected.) from cmd (echo ~...), pausing for 1 seconds
The playbook ends after that. The only workaround is to
export AWS_PROFILE=[the profile]
The vars ansible_aws_ssm_profile is ignored
@tremble The target is fully configured. The way I test it is to run from cli the aws ssm start-session comand with --profile option and it immediately responds correctly as I am in the ssm shell
@tremble There are no more logs after
That's really strange, I'd expect at least some more repeats of that error message.
Let me try again
Here is the update
ansible-playbook [core 2.14.2]
config file = ~/gitlab/ansible-cda-tools/ansible.cfg
configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = ~/Library/Python/3.9/lib/python/site-packages/ansible
ansible collection location = ~/gitlab/ansible-cda-tools/collections
executable location = ~/Library/Python/3.9/bin/ansible-playbook
python version = 3.9.16 (main, Dec 7 2022, 10:16:11) [Clang 14.0.0 (clang-1400.0.29.202)] (/usr/local/opt/python@3.9/bin/python3.9)
jinja version = 3.1.2
libyaml = True
Using ~/gitlab/ansible-cda-tools/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
script declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
Loading collection amazon.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/amazon/aws
Using inventory plugin 'ansible_collections.amazon.aws.plugins.inventory.aws_ec2' to process inventory source '~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml'
Parsed ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml inventory source with auto plugin
Loading callback plugin default of type stdout, v2.0 from ~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/callback/default.py
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
PLAYBOOK: ssm_connection_playbook.yml *********************************************************************************************************************************************************************************************************************************************************
Positional arguments: PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml',)
forks: 1
1 plays in PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
PLAY [Wait for connection to be available] ****************************************************************************************************************************************************************************************************************************************************
TASK [Ping] ***********************************************************************************************************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml:44
Loading collection community.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 0, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 0 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 1, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 1 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> ssm_retry: attempt: 2, caught exception(An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.) from cmd (echo ~...), pausing for 3 seconds
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
The full traceback is:
Traceback (most recent call last):
File "~/Library/Python/3.9/lib/python/site-packages/ansible/executor/task_executor.py", line 158, in run
res = self._execute()
File "~/Library/Python/3.9/lib/python/site-packages/ansible/executor/task_executor.py", line 629, in _execute
result = self._handler.run(task_vars=vars_copy)
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/normal.py", line 47, in run
result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 1040, in _execute_module
self._make_tmp_path()
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 457, in _make_tmp_path
tmpdir = self._remote_expand_user(self.get_shell_option('remote_tmp', default='~/.ansible/tmp'), sudoable=False)
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 923, in _remote_expand_user
data = self._low_level_execute_command(cmd, sudoable=False)
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/action/__init__.py", line 1320, in _low_level_execute_command
rc, stdout, stderr = self._connection.exec_command(cmd, in_data=in_data, sudoable=sudoable)
File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 197, in wrapped
return_tuple = func(self, *args, **kwargs)
File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 337, in exec_command
super(Connection, self).exec_command(cmd, in_data=in_data, sudoable=sudoable)
File "~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/connection/__init__.py", line 35, in wrapped
self._connect()
File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 271, in _connect
self.start_session()
File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 295, in start_session
response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
File "~/Library/Python/3.9/lib/python/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "~/Library/Python/3.9/lib/python/site-packages/botocore/client.py", line 960, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.TargetNotConnected: An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.
fatal: [wdw-ecommerce-certmgmtui-use1-stage-ansible-rhel7]: FAILED! => {
"msg": "Unexpected failure during module execution: An error occurred (TargetNotConnected) when calling the StartSession operation: i-xxxxxx is not connected.",
"stdout": ""
}
PLAY RECAP ************************************************************************************************************************************************************************************************************************************************************************************
wdw-ecommerce-certmgmtui-use1-stage-ansible-rhel7 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
That traceback shows that you're still using community.aws 1.1.0
not 5.2.0
:
File "~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws/plugins/connection/aws_ssm.py", line 295, in start_session response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
This is 1.1.0: https://github.com/ansible-collections/community.aws/blob/1.1.0/plugins/connection/aws_ssm.py#L295
response = client.start_session(Target=self.instance_id, Parameters=ssm_parameters)
This is 5.2.0: https://github.com/ansible-collections/community.aws/blob/5.2.0/plugins/connection/aws_ssm.py#L295
msg = f"ssm_retry: attempt: {attempt}, caught exception({e}) from cmd ({cmd_summary}), pausing for {pause} seconds"
How is this possible ?
ansible-galaxy collection list amazon.aws
# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection Version
---------- -------
amazon.aws 5.2.0
This is the only collection present on my system
That's amazon.aws. This collection is community.aws. For various reasons we split things into
amazon.aws: Supported by the Ansible Cloud team (A team in Red Hat paid to support Ansible 'cloud' modules) community.aws: Supported by the "community" (which often means a couple of 'usual suspects), however, the community are not generally paid to work on Ansible.)
Generally we recommend amazon.aws and community.aws being kept on the same major version. However it's (theoretically) possible to have amazon.aws at a higher major version than community.aws
ah ! good catch
ansible-galaxy collection list community.aws
# ~/Library/Python/3.9/lib/python/site-packages/ansible_collections
Collection Version
------------- -------
community.aws 5.2.0
# ~/gitlab/ansible-cda-tools/collections/ansible_collections
Collection Version
------------- -------
community.aws 5.2.0
echo $AWS_PROFILE
$
Rerunnig
so we are moving forward but then it seems to stall
C02YL0VEJGH8:ansible-cda-tools palea009$ clear ; ansible-playbook PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml -i inventory/aws_ec2.yaml^C-vvvv
C02YL0VEJGH8:ansible-cda-tools palea009$ echo $OBJC_DISABLE_INITIALIZE_FORK_SAFETY
C02YL0VEJGH8:ansible-cda-tools palea009$ clear
C02YL0VEJGH8:ansible-cda-tools palea009$ clear ; ansible-playbook PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml -i inventory/aws_ec2.yaml -vvvv
ansible-playbook [core 2.14.2]
config file = ~/gitlab/ansible-cda-tools/ansible.cfg
configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = ~/Library/Python/3.9/lib/python/site-packages/ansible
ansible collection location = ~/gitlab/ansible-cda-tools/collections
executable location = ~/Library/Python/3.9/bin/ansible-playbook
python version = 3.9.16 (main, Dec 7 2022, 10:16:11) [Clang 14.0.0 (clang-1400.0.29.202)] (/usr/local/opt/python@3.9/bin/python3.9)
jinja version = 3.1.2
libyaml = True
Using ~/gitlab/ansible-cda-tools/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
script declined parsing ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml as it did not pass its verify_file() method
Loading collection amazon.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/amazon/aws
Using inventory plugin 'ansible_collections.amazon.aws.plugins.inventory.aws_ec2' to process inventory source '~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml'
Parsed ~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml inventory source with auto plugin
Loading callback plugin default of type stdout, v2.0 from ~/Library/Python/3.9/lib/python/site-packages/ansible/plugins/callback/default.py
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
PLAYBOOK: ssm_connection_playbook.yml *********************************************************************************************************************************************************************************************************************************************************
Positional arguments: PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('~/gitlab/ansible-cda-tools/inventory/aws_ec2.yaml',)
forks: 5
1 plays in PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml
PLAY [Wait for connection to be available] ****************************************************************************************************************************************************************************************************************************************************
TASK [Ping] ***********************************************************************************************************************************************************************************************************************************************************************************
task path: ~/gitlab/ansible-cda-tools/PlayBooks/ansible_ssm_connection/ssm_connection_playbook.yml:44
Loading collection community.aws from ~/gitlab/ansible-cda-tools/collections/ansible_collections/community/aws
<i-xxxxxx> ESTABLISH SSM CONNECTION TO: i-xxxxxx
<i-xxxxxx> INITIALIZE BOTO3 CLIENTS
<i-xxxxxx> SETUP BOTO3 CLIENTS: SSM
<i-xxxxxx> _get_bucket_endpoint: S3 (global)
<i-xxxxxx> _get_bucket_endpoint: S3 (bucket region) - None
<i-xxxxxx> SETUP BOTO3 CLIENTS: S3 https://s3.amazonaws.com
<i-xxxxxx> START SSM SESSION: i-xxxxxx
<i-xxxxxx> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "botocore-session-xxxxx", "TokenValue": "xxxx", "StreamUrl": "wss://ssmmessages.us-east-1.amazonaws.com/v1/data-channel/botocore-session-xxxxx?role=publish_subscribe&cell-number=xxxx", "ResponseMetadata": {"RequestId": "xxxx", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Mon, 27 Feb 2023 15:49:25 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "971", "connection": "keep-alive", "x-amzn-requestid": "xxxx"}, "RetryAttempts": 0}}', 'us-east-1', 'StartSession', 'commerce1', '{"Target": "i-xxxxxx"}', 'https://ssm.us-east-1.amazonaws.com']
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\n'
<i-xxxxxx> PRE startup output received
<i-xxxxxx> PRE Disabling Echo: b'stty -echo\n'
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\nThis session is encrypted using AWS KMS.\r\n'
<i-xxxxxx> PRE remaining: 59
<i-xxxxxx> PRE stdout line:
b'\r\nStarting session with SessionId: botocore-session-xxxxx\r\nThis session is encrypted using AWS KMS.\r\n\x1b[?1034hsh-4.2$ '
<i-xxxxxx> PRE remaining: 58
<i-xxxxxx> PRE remaining: 57
<i-xxxxxx> PRE remaining: 56
<i-xxxxxx> PRE remaining: 55
<i-xxxxxx> PRE remaining: 54
<i-xxxxxx> PRE remaining: 53
<i-xxxxxx> PRE remaining: 52
<i-xxxxxx> PRE remaining: 51
<i-xxxxxx> PRE remaining: 50
<i-xxxxxx> PRE remaining: 49
<i-xxxxxx> PRE remaining: 48
<i-xxxxxx> PRE remaining: 47
....
<i-xxxxxx> PRE remaining: 1
<i-xxxxxx> PRE timeout stdout:
Ok, progress :)
This session is encrypted using AWS KMS.
So what you're seeing now is https://github.com/ansible-collections/community.aws/issues/684
For what it's worth, I do want to get support for KMS encrypted sessions working (I'd need it if we wanted to use this in $dayjob). However, after dealing with a pile of issues for 5.1.0 / 5.2.0 I wanted a break...
So that means that currently I wont be able to move forward with our playbooks ?
@tremble How do I deploy the collection with those merges ?
@tremble More updates I have tested connecting to various EC2 instances witha certain degree of success. It appears that the issue is limited , so far , to instances with an older amaazon ssm-agent. The 3.2+ agents work. The 2.x series just stall at the PRE remaining: xx portion of the taks after akn of the ssm start-session
So that means that currently I wont be able to move forward with our playbooks ?
It means that at this time there is a known issue when you use KMS encrypted SSM sessions. Dealing with this module is not part of $DayJob and I do not have the time/energy to try and dig too much into the problem at this time. This doesn't block anyone else from doing so.
@tremble How do I deploy the collection with those merges ?
If you're referring to the PRs attached to this issue. They're only additional tests, there's no change to the plugin. (That was part of me trying to reproduce your issue).
It appears that the issue is limited , so far , to instances with an older amazon ssm-agent.
That's an interesting data point. If you're able to work around the problem by updating the agent, then I'd strongly recommend doing so. As far as I can tell, the last 2.x release was in 2020. I'd be surprised if that didn't include security vulnerabilities.
@tremble When we see in logs the following
<i-xxx> EXEC remaining: 60
<i-xxx> EXEC remaining: 59
<i-xxx> EXEC remaining: 58
...
<i-xxx> EXEC remaining: 52
Does this mean that the executions is taking time to return the task or does it mean that the task is stalled ?
I seem to have more success with older ssm_agents when I up the time out
@tremble so it looks like
<i-xxx> EXEC remaining:
is execution time, and those seem to succeed The ones that seem to time out are
PRE remaining: 299
What does PRE indicate ?
PRE - "Prepare the Terminal" - https://github.com/ansible-collections/community.aws/blob/main/plugins/connection/aws_ssm.py#L603
A session has a number of phases:
If it's getting stuck with "PRE" output, then it's stuck somewhere in the "prepare the terminal" phase...
@tremble So far I am noticing consistency where with ssm_agent 2.1.xx the session is stuck in "PRE" output.
@tremble I can confirm that so far my experience is that this bug is only present when amazon ssm-agent is version 2.x , I updated the 2.x to 3.x and I am no longer experiencing the PRE timeout issue.
Summary
the variable: ansible_aws_ssm_profile is not taking effect when used. The only workaround I found is to use the export AWS_PROFILE=[profile name] in bash
Issue Type
Bug Report
Component Name
community.aws.aws_ssm connection
Ansible Version
Collection Versions
AWS SDK versions
Configuration
OS / Environment
MacOS Ventura 13.0.1
Steps to Reproduce
Expected Results
I expect to be able to connect to the ec2 instance in the aws account of the profile that is in my .aws/config And run the tasks
for now only works with workaround of declaring the aws profile in bash cli with export AWS_PROFILE=commerce1
Actual Results
Code of Conduct