Closed ryeleo closed 2 months ago
On my Jenkins Agents (CICD servers), this has not been an issue. It really seems the issue is related to my "mashing ctrl+c to cancel a build and instead rapidly start a new one"
Today, I'm noticing this issue issue consistently, even though I did not use "ctrl+c" consistently.
Between yesterday and early this morning, I was able to use ansible-runner for some 20-40 playbook runs without issue. But now, I am consistently seeing a ConnectionError
.
One time, restarting docker desktop seemed to fix the issue. However, ever since then, I am again running into this issue.
Command I am running:
ansible-runner run . --container-image docker.io/ntsjenkins/junos-ansible-ansible-execution-env:0.2.0 --process-isolation --process-isolation-executable docker -p test.yml
The traceback I am seeing this morning:
Traceback (most recent call last):
File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 206, in send
sf.connect(self.socket_path)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/usr/local/lib/python3.12/site-packages/ansible/cli/scripts/ansible_connection_cli_stub.py\", line 312, in main
conn.set_options(direct=options)
File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 193, in __rpc__
response = self._exec_jsonrpc(name, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 154, in _exec_jsonrpc
out = self.send(data)
^^^^^^^^^^^^^^^
File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 213, in send
raise ConnectionError(
ansible.module_utils.connection.ConnectionError: unable to connect to socket /runner/.ansible/pc/c0da244ec8. See the socket path issue category in Network Debug and Troubleshooting Guide
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/usr/local/bin/ansible-connection\", line 8, in <module>
sys.exit(main())
^^^^^^
File \"/usr/local/lib/python3.12/site-packages/ansible/cli/scripts/ansible_connection_cli_stub.py\", line 315, in main
raise ConnectionError('Unable to decode JSON from response set_options. See the debug log for more information.')
ansible.module_utils.connection.ConnectionError: Unable to decode JSON from response set_options. See the debug log for more information.
The full output from using ansible-runner (which contains the traceback):
$ ansible-runner run . --container-image docker.io/ntsjenkins/junos-ansible-ansible-execution-env:0.2.0 --process-isolation --process-isolation-executable docker -p test.yml --limit uop-ccenter-cfr1.net.uoregon.edu
Identity added: /runner/artifacts/2673c2f9-ad8b-4932-9db6-14cca5654fda/ssh_key_data (/runner/artifacts/2673c2f9-ad8b-4932-9db6-14cca5654fda/ssh_key_data)
Vault password:
PLAY [uop_cfrs] ****************************************************************
TASK [Deploy configuration] ****************************************************
fatal: [uop-ccenter-cfr1.net.uoregon.edu]: FAILED! => {"msg": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 206, in send\n sf.connect(self.socket_path)\nConnectionRefusedError: [Errno 111] Connection refused\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.12/site-packages/ansible/cli/scripts/ansible_connection_cli_stub.py\", line 312, in main\n conn.set_options(direct=options)\n File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 193, in __rpc__\n response = self._exec_jsonrpc(name, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 154, in _exec_jsonrpc\n out = self.send(data)\n ^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.12/site-packages/ansible/module_utils/connection.py\", line 213, in send\n raise ConnectionError(\nansible.module_utils.connection.ConnectionError: unable to connect to socket /runner/.ansible/pc/c0da244ec8. See the socket path issue category in Network Debug and Troubleshooting Guide\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/bin/ansible-connection\", line 8, in <module>\n sys.exit(main())\n ^^^^^^\n File \"/usr/local/lib/python3.12/site-packages/ansible/cli/scripts/ansible_connection_cli_stub.py\", line 315, in main\n raise ConnectionError('Unable to decode JSON from response set_options. See the debug log for more information.')\nansible.module_utils.connection.ConnectionError: Unable to decode JSON from response set_options. See the debug log for more information.\n"}
My plan is to switch back to using ansible without using ansible-runner on my local development machine. Hopefully our build agents using ansible-runner continue to never run into this issue!
So, to be honest, I'm not 100% sure what you are expecting or asking of runner here. If you are expecting to not see tracebacks or errors during your rapid development utilizing Ctrl+C to kill the process, I don't think that is reasonable. Abnormal termination could cause errors at multiple points in the process. Throw in multiple layers of execution utilizing runner and docker and ansible and network connections plugins, you are bound to experience errors in the process. Switching to bypassing ansible-runner
and just using ansible
itself might eliminate some of those layers, but certainly won't guarantee you won't see any errors.
All that being said, I'm not seeing any particular bug or issue here.
Fair point! Thanks for taking a look, @Shrews! 🙏
I was surprised when the issue came back up a couple of weeks ago without using "ctrl+c".
I do definitely wonder if I am the only user facing this issue. I am thinking:
We are planning on switching from Jenkins to GitHub Enterprise Server soon -- if I do start seeing this issue on our GitHub Runners, I will definitely report back here!
For more info about my environment, I am using:
- Windows 10 Version (10.0.19042 Build 19042)
- Docker Desktop 4.3.2 (72729) (is currently the newest version available.)
- You are using the WSL 2 backend
- Kubernetes Enabled using v1.22.4
The traceback you saw without using Ctrl+C seems to indicate a network plugin issue. Perhaps something with connectivity to your inventory hosts. That should not be runner related, and there's nothing I could help with there since that is at a level below runner (ansible level, most likely, or maybe docker).
Given the above, I'm going to close this issue. If you can link an issue directly to runner in the future, feel free to open a new issue with sufficient details to reproduce any errant behavior.
I'm working on my ansible-runner workflow. I got Ansible Execution Environments working, and am doing rapid local development on my workstation.
I expect to always be able to do this workflow rapidly.
Instead, I observe the following error:
Command
Intermittent Error: