Closed alv000h closed 8 months ago
I have a similar issue, not sure if it's related to the same underlying issue though. We're running ansible in the cloud via a bastion hosts, and are seeing a similar timeout issue, we see the following debug output for a task:
morganj@AMS-02-DHI-0003> cat ansible-debug.out| grep "51595" ~/dev/ceng-devtest
51595 1489493352.98877: running TaskExecutor() for ceng-gcp-data-platform-airflow/TASK: rsyslog : install
51595 1489493352.98947: in run()
51595 1489493352.98996: calling self._execute()
51595 1489493353.00023: Loading Connection 'ssh' from /usr/lib/python2.7/site-packages/ansible/plugins/connection/ssh.py
51595 1489493353.00743: Loading ShellModule 'csh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/csh.py
51595 1489493353.00796: Loading ShellModule 'fish' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/fish.py
51595 1489493353.00856: Loading ShellModule 'powershell' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/powershell.py
51595 1489493353.00890: Loading ShellModule 'sh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/sh.py
51595 1489493353.00960: Loading ShellModule 'sh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/sh.py (found_in_cache=True, class_only=False)
51595 1489493353.00983: in VariableManager get_vars()
51595 1489493353.01682: done with get_vars()
51595 1489493353.01729: Loading ActionModule 'normal' from /usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py
51595 1489493353.01748: starting attempt loop
51595 1489493353.01761: running the handler
51595 1489493353.01811: ANSIBALLZ: using cached module: /home/morganj/.ansible/tmp/ansible-local-50907dBX0ja/ansiballz_cache/yum-ZIP_DEFLATED
51595 1489493353.01911: _low_level_execute_command(): starting
51595 1489493353.01928: _low_level_execute_command(): executing: /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823 `" && echo ansible-tmp-1489493353.02-200902091435823="` echo $HOME/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823 `" ) && sleep 0'
51595 1489493353.15604: stderr chunk (state=2):
51595 1489493353.46941: stderr chunk (state=3):
51595 1489493353.82566: stdout chunk (state=3):
51595 1489493353.82614: stdout chunk (state=3):
51595 1489493353.82642: stderr chunk (state=3):
51595 1489493353.82657: stderr chunk (state=3):
51595 1489493353.82684: _low_level_execute_command() done: rc=0, stdout=ansible-tmp-1489493353.02-200902091435823=/home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823
51595 1489493353.82703: transferring module to remote /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py
51595 1489493353.93274: stderr chunk (state=2):
51595 1489493354.23288: stderr chunk (state=3):
51595 1489493354.68999: stdout chunk (state=3):
51595 1489493354.69028: stderr chunk (state=3):
51595 1489493354.69044: stderr chunk (state=3):
51595 1489493354.69075: done transferring module to remote
51595 1489493354.69108: _low_level_execute_command(): starting
51595 1489493354.69123: _low_level_execute_command(): executing: /bin/sh -c 'chmod u+x /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/ /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py && sleep 0'
51595 1489493354.79344: stderr chunk (state=2):
51595 1489493355.09285: stderr chunk (state=3):
51595 1489493355.45604: stdout chunk (state=3):
51595 1489493355.45642: stderr chunk (state=3):
51595 1489493355.45656: stderr chunk (state=3):
51595 1489493355.45681: _low_level_execute_command() done: rc=0, stdout=, stderr=Warning: Permanently added '104.199.104.71' (ECDSA) to the list of known hosts.
51595 1489493355.45704: _low_level_execute_command(): starting
51595 1489493355.45716: _low_level_execute_command(): using become for this command
51595 1489493355.45741: _low_level_execute_command(): executing: /bin/sh -c 'sudo -H -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq; https_proxy=https://10.0.0.5:3128 http_proxy=http://10.0.0.5:3128 /usr/bin/python /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py; rm -rf "/home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/" > /dev/null 2>&1'"'"' && sleep 0'
51595 1489493355.46218: Initial state: awaiting_escalation: BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq
51595 1489493355.56544: stderr chunk (state=1):
51595 1489493355.87374: stderr chunk (state=1):
51595 1489493356.23825: stdout chunk (state=1):
51595 1489493356.23855: become_success: (source=stdout, state=awaiting_escalation): 'BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq'
51595 1489493356.23866: Escalation succeeded
This is a yum tasks to install rsyslog. If you need anymore information, or think i should raise another issue, then please let me know. Also happy to do more debugging if necessary.
Cheers
2.2.2.0 is also affected and also via bastion
problematic task:
- name: Execute script
command: chdir=/tmp sh script.sh
script itself
#!/bin/bash
PARTITION="/dev/xvdf1"
PARTITION_NR="${PARTITION:~0}"
MOUNT_DRIVE="${PARTITION%?}"
MOUNT_DIR="/opt"
if [ ! -b $PARTITION ]
then
parted $MOUNT_DRIVE --script mklabel gpt mkpart product -- 0 -1
mkfs.ext4 $PARTITION
echo "$PARTITION $MOUNT_DIR ext4 defaults 1 3" >> /etc/fstab
mkdir -p $MOUNT_DIR
mount $MOUNT_DIR
chmod o+w $MOUNT_DIR
else
echo "$PARTITION exists"
fi
when I run it manually everything is okay
IMO This is needed in general, in the sense that if I'm running a playbook against many machines and one hangs, the entire play is ruined. A forced timeout would allow the rest of the hosts to succeed, and then I can go back and manually debug the failed server.
I can't tell you how frustrating it is when I have to kill a playbook for this very reason. For example, right now I'm running a yum clean all
in a playbook. All 500 servers except one ran the command OK. However, the other server doesn't even have the yum process running! I can't kill it even manually on the remote machine! Also, even terminating the ssh connection does not force ansible to continue. I have to take that host out of my inventory in order to have the playbook run successfully for all of the other hosts.
Should one hung host determine the playbook for the rest of them? I don't think so.
@infrascripting is using strategy: free
an option for you? http://docs.ansible.com/ansible/latest/playbooks_strategies.html
I was not aware of strategy: free
, however, I don't think it'd be feasible in this instance.
Primarily because if I don't have all the servers finish the playbook, there will be no log file written (callback plugin), which is required for auditing purposes. Additionally, it would be more of a hassle to debug where a specific host failed if I have to cross reference every host with where they ended up, as opposed to every host at the task that is hung.
Ultimately, a timeout serves both these purposes by letting the server fail if a command does not execute correctly. In a similar instance as the above, the yum clean all
process does exist, however it is hung. Killing it on the remote machine returns a 'FAILED' code to ansible, and lets it continue on with the rest of the play.
In almost every case where a bug report is filed that ansible hangs "seemingly forever" there exists the possibility of this timeout returning valuable debugging information during the run - or that is stored in logs (callback plugins)
A cursory search for "hangs" revealed:
the error happened again and ansible hanged indefinitely
-- https://github.com/ansible/ansible/issues/24109
There it hangs, seemingly forever.
-- https://github.com/ansible/ansible/issues/16894
This is a major inconvenience when working with Ansible, because Ansible hangs until the remote machine closes the connection, without error.
-- https://github.com/ansible/ansible/issues/10874
It ll be nice to have an option to specify a hard timeout of the script. That way, you can avoid issues when the script hangs and get stuck forever.
-- https://github.com/ansible/ansible/issues/23676
I keep beating my head against the wall - a few days ago, ansible stopped being able to connect to one of my boxes. It just hangs after the output below.
-- https://github.com/ansible/ansible/issues/17273
The machines just hang on gathering facts [...] I have no idea how long it had been running, could have been weeks or longer.
-- https://github.com/ansible/ansible/issues/24536
EXPECTED RESULTS
Either the copy succeeds or it fails
ACTUAL RESULTS
The copy is stuck. I have to CTRL C to exit.
Also, as a side-effect, a session is left running on my machine:
I have to kill this "[mux]" process for any subsequent ansible command to succeed, otherwise it stays stuck at the SSH connection...
-- https://github.com/ansible/ansible/issues/16929
I know I am not the first to report this, but I have not found any conclusive solution to the issue. Basically during 'gathering facts' ansible will hang on a single host, while executing setup via python on the target node.
👍
Still no solution in Ansible 2.4 ?
Any update on this? I experience also random hanging tasks, especially with yum and package installs. Running at the moment on 2.4.2
I hit same issue as well, ansible task hang on "sftp a file to remote", file is created but size stuck at 0.
work around by reduce interface mtu size, was 9k and reduced to 1500.
work around by reduce interface mtu size, was 9k and reduced to 1500.
I had this problem, and resolved by your comment. thank you so match.
CentOS 6.9
Ansible 2.4.2
Python 2.6.6
I'll just leave this here.
I am facing the same issue while executing a bash script from ansible, Any solutions for this problem
I started getting this issue ~3 weeks ago. I "solved" it by setting
ssh_args = -o ControlPath=/var/shm/control:%h:%p:%r
in my ansible.cfg. It seems that there is an issue with the socket the ControlMaster feature of ssh creates. Upon creation on my regular ext4 filesystem, it blocked ansible.
Generally, here is a list of things to check (from the different issues on ansible hunging - almost all have occured to me in the past):
transport
option to check if there is an ssh configuration issue on your local machine.ssh_args
option or delete the ~/.ansible folder on your local host.dmesg
) freezes your session.hit the same issue while using copy module , ansible 2.6
+1
Well I can understand how +1 can be anoying, but not as annoying as no proper response about an issue in over 2 years and 6 major versions later.
I am dealing with this issue, I had an issue with https://github.com/ansible/ansible/issues/14426 issue. and then it didnt read in the group vars. so I used inventory_hostname instead and continued. Then I removed gather facts and I got a freeze instead of the error of privilege.
For me, Ansible freezes at:
raw: '/tmp/ansible.Dr1LTB.build/run doSomething'
args:
chdir: "/tmp/ansible.Dr1LTB.build/"
where the following works normally(!):
command: 'npx run doSomething'
args:
chdir: "/tmp/ansible.Dr1LTB.build/"
Both work in the same way when I run them manually.
No logs, no error messages, nothing - kinda embarassing. The freezing of low-level command like raw
where high-level script
performs fine is quite weird.
I have a similar issue, not sure if it's related to the same underlying issue though. We're running ansible in the cloud via a bastion hosts, and are seeing a similar timeout issue, we see the following debug output for a task:
morganj@AMS-02-DHI-0003> cat ansible-debug.out| grep "51595" ~/dev/ceng-devtest 51595 1489493352.98877: running TaskExecutor() for ceng-gcp-data-platform-airflow/TASK: rsyslog : install 51595 1489493352.98947: in run() 51595 1489493352.98996: calling self._execute() 51595 1489493353.00023: Loading Connection 'ssh' from /usr/lib/python2.7/site-packages/ansible/plugins/connection/ssh.py 51595 1489493353.00743: Loading ShellModule 'csh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/csh.py 51595 1489493353.00796: Loading ShellModule 'fish' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/fish.py 51595 1489493353.00856: Loading ShellModule 'powershell' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/powershell.py 51595 1489493353.00890: Loading ShellModule 'sh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/sh.py 51595 1489493353.00960: Loading ShellModule 'sh' from /usr/lib/python2.7/site-packages/ansible/plugins/shell/sh.py (found_in_cache=True, class_only=False) 51595 1489493353.00983: in VariableManager get_vars() 51595 1489493353.01682: done with get_vars() 51595 1489493353.01729: Loading ActionModule 'normal' from /usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py 51595 1489493353.01748: starting attempt loop 51595 1489493353.01761: running the handler 51595 1489493353.01811: ANSIBALLZ: using cached module: /home/morganj/.ansible/tmp/ansible-local-50907dBX0ja/ansiballz_cache/yum-ZIP_DEFLATED 51595 1489493353.01911: _low_level_execute_command(): starting 51595 1489493353.01928: _low_level_execute_command(): executing: /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823 `" && echo ansible-tmp-1489493353.02-200902091435823="` echo $HOME/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823 `" ) && sleep 0' 51595 1489493353.15604: stderr chunk (state=2): 51595 1489493353.46941: stderr chunk (state=3): 51595 1489493353.82566: stdout chunk (state=3): 51595 1489493353.82614: stdout chunk (state=3): 51595 1489493353.82642: stderr chunk (state=3): 51595 1489493353.82657: stderr chunk (state=3): 51595 1489493353.82684: _low_level_execute_command() done: rc=0, stdout=ansible-tmp-1489493353.02-200902091435823=/home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823 51595 1489493353.82703: transferring module to remote /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py 51595 1489493353.93274: stderr chunk (state=2): 51595 1489493354.23288: stderr chunk (state=3): 51595 1489493354.68999: stdout chunk (state=3): 51595 1489493354.69028: stderr chunk (state=3): 51595 1489493354.69044: stderr chunk (state=3): 51595 1489493354.69075: done transferring module to remote 51595 1489493354.69108: _low_level_execute_command(): starting 51595 1489493354.69123: _low_level_execute_command(): executing: /bin/sh -c 'chmod u+x /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/ /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py && sleep 0' 51595 1489493354.79344: stderr chunk (state=2): 51595 1489493355.09285: stderr chunk (state=3): 51595 1489493355.45604: stdout chunk (state=3): 51595 1489493355.45642: stderr chunk (state=3): 51595 1489493355.45656: stderr chunk (state=3): 51595 1489493355.45681: _low_level_execute_command() done: rc=0, stdout=, stderr=Warning: Permanently added '104.199.104.71' (ECDSA) to the list of known hosts. 51595 1489493355.45704: _low_level_execute_command(): starting 51595 1489493355.45716: _low_level_execute_command(): using become for this command 51595 1489493355.45741: _low_level_execute_command(): executing: /bin/sh -c 'sudo -H -S -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq; https_proxy=https://10.0.0.5:3128 http_proxy=http://10.0.0.5:3128 /usr/bin/python /home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/yum.py; rm -rf "/home/cengtest-sshadmin/.ansible/tmp/ansible-tmp-1489493353.02-200902091435823/" > /dev/null 2>&1'"'"' && sleep 0' 51595 1489493355.46218: Initial state: awaiting_escalation: BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq 51595 1489493355.56544: stderr chunk (state=1): 51595 1489493355.87374: stderr chunk (state=1): 51595 1489493356.23825: stdout chunk (state=1): 51595 1489493356.23855: become_success: (source=stdout, state=awaiting_escalation): 'BECOME-SUCCESS-fouducionjbihlnkubdbuhrzprvnnfxq' 51595 1489493356.23866: Escalation succeeded
This is a yum tasks to install rsyslog. If you need anymore information, or think i should raise another issue, then please let me know. Also happy to do more debugging if necessary.
Cheers
@mogthesprog : I know it's almost 2 years ago. However: Do you remember finding a solution to this? I'm having a similar problem which stops at the same spot with yum (https://github.com/ansible/ansible/issues/51284).
Looks like someone patched ansible 2.8 https://github.com/ansible/ansible/issues/47210
In my case, I added in my ansible.cfg this statement:
ssh_args = " -o StrictHostKeyChecking=no"
Without thinking that this would override the default ssh_args options. So, checking /etc/ansible/ansible.cfg, I found the defaults and added them too. Like magic, it stopped hanging. :)
Final setting in my ansible.cfg:
ssh_args = "-C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no"
I spoke too soon - that change got me a little further along but it's still hanging
I was having this issue with the get_url
module and the file
module in Ansible 2.7.8. I noticed the pattern that this was failing when I defined a file mode but did not explicitly define the owner and group. Explicitly setting the owner:
and group:
attributes to their appropriate values resolved the issue for me.
Example:
name: Download oh-my-zsh
get_url:
url: https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh
dest: /tmp/install-zsh.sh
mode: 0755
results in the waiting pattern, but
name: Download oh-my-zsh
get_url:
url: https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh
dest: /tmp/install-zsh.sh
owner: vagrant
group: vagrant
mode: 0755
does not result in waiting.
Several of my boxes still hang. I love Ansible but I have to resolve this or it's just not usable. A little more info...
It appears to be related to privilege escalation, at least in my case. If I ssh manually to server1, sudo to root, and rerun ansible with the -b, before the sudo cached password times out, it works. So it doesn't seem to recognize the sudo password prompt on all boxes.
ansible -vvv server1 -m ping # succeeds
ansible -vvv -b server1 -m ping # hangs at the sudo attempt
# unless I first ssh and sudo manually
Other data points:
Some output from the command, "ansible -vvvv -b server1 -m ping"
...
<server1> ESTABLISH SSH CONNECTION FOR USER: d_lewis
<server1> SSH: EXEC sshpass -d10 ssh -vvv -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="d_lewis"' -o ConnectTimeout=10 -o ControlPath=/home/dlewis/.ansible/cp/51303f75b6 server1 '/bin/sh -c '"'"'chmod u+x /home/d_lewis/.ansible/tmp/ansible-tmp-1559842970.91-113507000465060/ /home/d_lewis/.ansible/tmp/ansible-tmp-1559842970.91-113507000465060/AnsiballZ_ping.py && sleep 0'"'"''
<server1> (0, '', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /home/dlewis/.ssh/config\r\ndebug1: /home/dlewis/.ssh/config line 58: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 58: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 26079\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
<server1> ESTABLISH SSH CONNECTION FOR USER: d_lewis
<server1> SSH: EXEC sshpass -d10 ssh -vvv -o ServerAliveInterval=30 -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="d_lewis"' -o ConnectTimeout=10 -o ControlPath=/home/dlewis/.ansible/cp/51303f75b6 -tt server1 '/bin/sh -c '"'"'dzdo -H -S -p "[dzdo via ansible, key=ujejcfadoztrfdckegqhcleymbhirwew] password:" -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ujejcfadoztrfdckegqhcleymbhirwew ; /usr/bin/python /home/d_lewis/.ansible/tmp/ansible-tmp-1559842970.91-113507000465060/AnsiballZ_ping.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Hangs there for several minutes - I assume until the sudo/dzdo prompt times out.
for my case it hangs till timeout.
vincent@vincentMBP # ansible -vvvv serverc -m ping
ansible 2.8.0
config file = /Users/vincent/tansible/ansible.cfg
configured module search path = ['/Users/vincent/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/2.8.0/libexec/lib/python3.7/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.7.3 (default, Mar 27 2019, 09:23:15) [Clang 10.0.1 (clang-1001.0.46.3)]
Using /Users/vincent/tansible/ansible.cfg as config file
setting up inventory plugins
Parsed /Users/vincent/tansible/host.txt inventory source with ini plugin
Loading callback plugin minimal of type stdout, v2.0 from /usr/local/Cellar/ansible/2.8.0/libexec/lib/python3.7/site-packages/ansible/plugins/callback/minimal.py
META: ran handlers
<serverc> ESTABLISH SSH CONNECTION FOR USER: root
<serverc> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/Users/vincent/.ansible/cp/b17d87e4d6 serverc '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<serverc> (0, b'/root\r\n', b'OpenSSH_7.9p1, LibreSSL 2.7.3\r\ndebug1: Reading configuration data /Users/vincent/.ssh/config\r\ndebug1: /Users/vincent/.ssh/config line 135: Applying options for serverc\r\ndebug1: /Users/vincent/.ssh/config line 183: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 48: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: Control socket "/Users/vincent/.ansible/cp/b17d87e4d6" does not exist\r\ndebug2: resolving "serverc" port 22\r\ndebug2: ssh_connect_direct\r\ndebug1: Connecting to serverc [192.168.3.212] port 22.\r\ndebug2: fd 5 setting O_NONBLOCK\r\ndebug1: fd 5 clearing O_NONBLOCK\r\ndebug1: Connection established.\r\ndebug3: timeout: 9952 ms remain after connect\r\ndebug1: identity file /Users/vincent/.ssh/id_rsa type 0\r\ndebug1: identity file /Users/vincent/.ssh/id_rsa-cert type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_dsa type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_dsa-cert type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_ecdsa type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_ecdsa-cert type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_ed25519 type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_ed25519-cert type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_xmss type -1\r\ndebug1: identity file /Users/vincent/.ssh/id_xmss-cert type -1\r\ndebug1: Local version string SSH-2.0-OpenSSH_7.9\r\ndebug1: Remote protocol version 2.0, remote software version OpenSSH_7.4\r\ndebug1: match: OpenSSH_7.4 pat OpenSSH_7.0*,OpenSSH_7.1*,OpenSSH_7.2*,OpenSSH_7.3*,OpenSSH_7.4*,OpenSSH_7.5*,OpenSSH_7.6*,OpenSSH_7.7* compat 0x04000002\r\ndebug2: fd 5 setting O_NONBLOCK\r\ndebug1: Authenticating to serverc:22 as \'root\'\r\ndebug3: put_host_port: [serverc]:22\r\ndebug3: hostkeys_foreach: reading file "/Users/vincent/.ssh/known_hosts"\r\ndebug3: record_hostkey: found key type ECDSA in file /Users/vincent/.ssh/known_hosts:132\r\ndebug3: record_hostkey: found key type RSA in file /Users/vincent/.ssh/known_hosts:140\r\ndebug3: load_hostkeys: loaded 2 keys from [serverc]:22\r\ndebug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,rsa-sha2-512,rsa-sha2-256,ssh-rsa\r\ndebug3: send packet: type 20\r\ndebug1: SSH2_MSG_KEXINIT sent\r\ndebug3: receive packet: type 20\r\ndebug1: SSH2_MSG_KEXINIT received\r\ndebug2: local client KEXINIT proposal\r\ndebug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c\r\ndebug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519-cert-v01@openssh.com,ssh-ed25519\r\ndebug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com\r\ndebug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com\r\ndebug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1\r\ndebug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1\r\ndebug2: compression ctos: zlib@openssh.com,zlib,none\r\ndebug2: compression stoc: zlib@openssh.com,zlib,none\r\ndebug2: languages ctos: \r\ndebug2: languages stoc: \r\ndebug2: first_kex_follows 0 \r\ndebug2: reserved 0 \r\ndebug2: peer server KEXINIT proposal\r\ndebug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1\r\ndebug2: host key algorithms: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519\r\ndebug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc,blowfish-cbc,cast128-cbc,3des-cbc\r\ndebug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc,blowfish-cbc,cast128-cbc,3des-cbc\r\ndebug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1\r\ndebug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1\r\ndebug2: compression ctos: none,zlib@openssh.com\r\ndebug2: compression stoc: none,zlib@openssh.com\r\ndebug2: languages ctos: \r\ndebug2: languages stoc: \r\ndebug2: first_kex_follows 0 \r\ndebug2: reserved 0 \r\ndebug1: kex: algorithm: curve25519-sha256\r\ndebug1: kex: host key algorithm: ecdsa-sha2-nistp256\r\ndebug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: zlib@openssh.com\r\ndebug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: zlib@openssh.com\r\ndebug3: send packet: type 30\r\ndebug1: expecting SSH2_MSG_KEX_ECDH_REPLY\r\ndebug3: receive packet: type 31\r\ndebug1: Server host key: ecdsa-sha2-nistp256 SHA256:1VVaMDmgjtuwqASxZCLIPZNkagDHT18VxPRoFYV0Wzw\r\ndebug3: put_host_port: [192.168.3.212]:22\r\ndebug3: put_host_port: [serverc]:22\r\ndebug3: hostkeys_foreach: reading file "/Users/vincent/.ssh/known_hosts"\r\ndebug3: record_hostkey: found key type ECDSA in file /Users/vincent/.ssh/known_hosts:132\r\ndebug3: record_hostkey: found key type RSA in file /Users/vincent/.ssh/known_hosts:140\r\ndebug3: load_hostkeys: loaded 2 keys from [serverc]:22\r\ndebug3: hostkeys_foreach: reading file "/Users/vincent/.ssh/known_hosts"\r\ndebug3: record_hostkey: found key type ECDSA in file /Users/vincent/.ssh/known_hosts:131\r\ndebug3: load_hostkeys: loaded 1 keys from [192.168.3.212]:22\r\ndebug1: Host \'[serverc]:22\' is known and matches the ECDSA host key.\r\ndebug1: Found key in /Users/vincent/.ssh/known_hosts:132\r\ndebug3: send packet: type 21\r\ndebug2: set_newkeys: mode 1\r\ndebug1: rekey after 134217728 blocks\r\ndebug1: SSH2_MSG_NEWKEYS sent\r\ndebug1: expecting SSH2_MSG_NEWKEYS\r\ndebug3: receive packet: type 21\r\ndebug1: SSH2_MSG_NEWKEYS received\r\ndebug2: set_newkeys: mode 0\r\ndebug1: rekey after 134217728 blocks\r\ndebug1: Will attempt key: /Users/vincent/.ssh/id_rsa RSA SHA256:jFJZZEt1K+N4p4fsX509tQbKNiQwEob+zg0whAJs+FA\r\ndebug1: Will attempt key: /Users/vincent/.ssh/id_dsa \r\ndebug1: Will attempt key: /Users/vincent/.ssh/id_ecdsa \r\ndebug1: Will attempt key: /Users/vincent/.ssh/id_ed25519 \r\ndebug1: Will attempt key: /Users/vincent/.ssh/id_xmss \r\ndebug2: pubkey_prepare: done\r\ndebug3: send packet: type 5\r\ndebug3: receive packet: type 7\r\ndebug1: SSH2_MSG_EXT_INFO received\r\ndebug1: kex_input_ext_info: server-sig-algs=<rsa-sha2-256,rsa-sha2-512>\r\ndebug3: receive packet: type 6\r\ndebug2: service_accept: ssh-userauth\r\ndebug1: SSH2_MSG_SERVICE_ACCEPT received\r\ndebug3: send packet: type 50\r\ndebug3: receive packet: type 51\r\ndebug1: Authentications that can continue: publickey,password\r\ndebug3: start over, passed a different list publickey,password\r\ndebug3: preferred gssapi-with-mic,gssapi-keyex,hostbased,publickey\r\ndebug3: authmethod_lookup publickey\r\ndebug3: remaining preferred: ,gssapi-keyex,hostbased,publickey\r\ndebug3: authmethod_is_enabled publickey\r\ndebug1: Next authentication method: publickey\r\ndebug1: Offering public key: /Users/vincent/.ssh/id_rsa RSA SHA256:jFJZZEt1K+N4p4fsX509tQbKNiQwEob+zg0whAJs+FA\r\ndebug3: send packet: type 50\r\ndebug2: we sent a publickey packet, wait for reply\r\ndebug3: receive packet: type 60\r\ndebug1: Server accepts key: /Users/vincent/.ssh/id_rsa RSA SHA256:jFJZZEt1K+N4p4fsX509tQbKNiQwEob+zg0whAJs+FA\r\ndebug3: sign_and_send_pubkey: RSA SHA256:jFJZZEt1K+N4p4fsX509tQbKNiQwEob+zg0whAJs+FA\r\ndebug3: sign_and_send_pubkey: signing using rsa-sha2-512\r\ndebug3: send packet: type 50\r\ndebug3: receive packet: type 52\r\ndebug1: Enabling compression at level 6.\r\ndebug1: Authentication succeeded (publickey).\r\nAuthenticated to serverc ([192.168.3.212]:22).\r\ndebug1: setting up multiplex master socket\r\ndebug3: muxserver_listen: temporary control path /Users/vincent/.ansible/cp/b17d87e4d6.C7BGXjv82Yt9mot2\r\ndebug2: fd 6 setting O_NONBLOCK\r\ndebug3: fd 6 is O_NONBLOCK\r\ndebug3: fd 6 is O_NONBLOCK\r\ndebug1: channel 0: new [/Users/vincent/.ansible/cp/b17d87e4d6]\r\ndebug3: muxserver_listen: mux listener channel 0 fd 6\r\ndebug2: fd 5 setting TCP_NODELAY\r\ndebug3: ssh_packet_set_tos: set IP_TOS 0x20\r\ndebug1: control_persist_detach: backgrounding master process\r\ndebug2: control_persist_detach: background process is 3567\r\ndebug2: fd 6 setting O_NONBLOCK\r\ndebug1: forking to background\r\ndebug1: Entering interactive session.\r\ndebug1: pledge: id\r\ndebug2: set_control_persist_exit_time: schedule exit in 60 seconds\r\ndebug1: multiplexing control connection\r\ndebug3: fd 3 is O_NONBLOCK\r\ndebug3: fd 3 is O_NONBLOCK\r\ndebug1: channel 1: new [mux-control]\r\ndebug3: channel_post_mux_listener: new mux channel 1 fd 3\r\ndebug3: mux_master_read_cb: channel 1: hello sent\r\ndebug2: set_control_persist_exit_time: cancel scheduled exit\r\ndebug3: mux_master_read_cb: channel 1 packet type 0x00000001 len 4\r\ndebug2: mux_master_process_hello: channel 1 slave version 4\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_master_read_cb: channel 1 packet type 0x10000004 len 4\r\ndebug2: mux_master_process_alive_check: channel 1: alive check\r\ndebug3: mux_client_request_alive: done pid = 3569\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_master_read_cb: channel 1 packet type 0x10000002 len 124\r\ndebug2: mux_master_process_new_session: channel 1: request tty 1, X 0, agent 0, subsys 0, term "xterm-256color", cmd "/bin/sh -c \'echo ~root && sleep 0\'", env 2\r\ndebug3: mux_master_process_new_session: got fds stdin 7, stdout 8, stderr 9\r\ndebug2: fd 8 setting O_NONBLOCK\r\ndebug2: fd 9 setting O_NONBLOCK\r\ndebug1: channel 2: new [client-session]\r\ndebug2: mux_master_process_new_session: channel_new: 2 linked to control channel 1\r\ndebug2: channel 2: send open\r\ndebug3: send packet: type 90\r\ndebug3: receive packet: type 80\r\ndebug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0\r\ndebug3: receive packet: type 91\r\ndebug2: channel_input_open_confirmation: channel 2: callback start\r\ndebug2: client_session2_setup: id 2\r\ndebug2: channel 2: request pty-req confirm 1\r\ndebug3: send packet: type 98\r\ndebug1: Sending environment.\r\ndebug1: Sending env LC_ALL = en_US.UTF-8\r\ndebug2: channel 2: request env confirm 0\r\ndebug3: send packet: type 98\r\ndebug1: Sending env LC_CTYPE = UTF-8\r\ndebug2: channel 2: request env confirm 0\r\ndebug3: send packet: type 98\r\ndebug1: Sending command: /bin/sh -c \'echo ~root && sleep 0\'\r\ndebug2: channel 2: request exec confirm 1\r\ndebug3: send packet: type 98\r\ndebug3: mux_session_confirm: sending success reply\r\ndebug2: channel_input_open_confirmation: channel 2: callback done\r\ndebug2: channel 2: open confirm rwindow 0 rmax 32768\r\ndebug3: receive packet: type 99\r\ndebug2: channel_input_status_confirm: type 99 id 2\r\ndebug2: PTY allocation request accepted on channel 2\r\ndebug2: channel 2: rcvd adjust 2097152\r\ndebug3: receive packet: type 99\r\ndebug2: channel_input_status_confirm: type 99 id 2\r\ndebug2: exec request accepted on channel 2\r\ndebug3: receive packet: type 98\r\ndebug1: client_input_channel_req: channel 2 rtype exit-status reply 0\r\ndebug3: mux_exit_message: channel 2: exit message, exitval 0\r\ndebug3: receive packet: type 98\r\ndebug1: client_input_channel_req: channel 2 rtype eow@openssh.com reply 0\r\ndebug2: channel 2: rcvd eow\r\ndebug2: channel 2: chan_shutdown_read (i0 o0 sock -1 wfd 7 efd 9 [write])\r\ndebug2: channel 2: input open -> closed\r\ndebug3: receive packet: type 96\r\ndebug2: channel 2: rcvd eof\r\ndebug2: channel 2: output open -> drain\r\ndebug2: channel 2: obuf empty\r\ndebug2: channel 2: chan_shutdown_write (i3 o1 sock -1 wfd 8 efd 9 [write])\r\ndebug2: channel 2: output drain -> closed\r\ndebug3: receive packet: type 97\r\ndebug2: channel 2: rcvd close\r\ndebug3: channel 2: will not send data after close\r\ndebug2: channel 2: send close\r\ndebug3: send packet: type 97\r\ndebug2: channel 2: is dead\r\ndebug2: channel 2: gc: notify user\r\ndebug3: mux_master_session_cleanup_cb: entering for channel 2\r\ndebug2: channel 1: rcvd close\r\ndebug2: channel 1: output open -> drain\r\ndebug2: channel 1: chan_shutdown_read (i0 o1 sock 3 wfd 3 efd -1 [closed])\r\ndebug2: channel 1: input open -> closed\r\ndebug2: channel 2: gc: user detached\r\ndebug2: channel 2: is dead\r\ndebug2: channel 2: garbage collecting\r\ndebug1: channel 2: free: client-session, nchannels 3\r\ndebug3: channel 2: status: The following connections are open:\r\n #1 mux-control (t16 nr0 i3/0 o1/16 e[closed]/0 fd 3/3/-1 sock 3 cc -1)\r\n #2 client-session (t4 r0 i3/0 o3/0 e[write]/0 fd -1/-1/9 sock -1 cc -1)\r\n\r\ndebug2: channel 1: obuf empty\r\ndebug2: channel 1: chan_shutdown_write (i3 o1 sock 3 wfd 3 efd -1 [closed])\r\ndebug2: channel 1: output drain -> closed\r\ndebug2: channel 1: is dead (local)\r\ndebug2: channel 1: gc: notify user\r\ndebug3: mux_master_control_cleanup_cb: entering for channel 1\r\ndebug2: channel 1: gc: user detached\r\ndebug2: channel 1: is dead (local)\r\ndebug2: channel 1: garbage collecting\r\ndebug1: channel 1: free: mux-control, nchannels 2\r\ndebug3: channel 1: status: The following connections are open:\r\n #1 mux-control (t16 nr0 i3/0 o3/0 e[closed]/0 fd 3/3/-1 sock 3 cc -1)\r\n\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to serverc closed.\r\ndebug2: set_control_persist_exit_time: schedule exit in 60 seconds\r\n')
<serverc> ESTABLISH SSH CONNECTION FOR USER: root
<serverc> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/Users/vincent/.ansible/cp/b17d87e4d6 serverc '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1561945122.158012-119072658004192 `" && echo ansible-tmp-1561945122.158012-119072658004192="` echo /root/.ansible/tmp/ansible-tmp-1561945122.158012-119072658004192 `" ) && sleep 0'"'"''
<serverc> (0, b'ansible-tmp-1561945122.158012-119072658004192=/root/.ansible/tmp/ansible-tmp-1561945122.158012-119072658004192\r\n', b'OpenSSH_7.9p1, LibreSSL 2.7.3\r\ndebug1: Reading configuration data /Users/vincent/.ssh/config\r\ndebug1: /Users/vincent/.ssh/config line 135: Applying options for serverc\r\ndebug1: /Users/vincent/.ssh/config line 183: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 48: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 3569\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to serverc closed.\r\n')
<serverc> Attempting python interpreter discovery
<serverc> ESTABLISH SSH CONNECTION FOR USER: root
<serverc> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/Users/vincent/.ansible/cp/b17d87e4d6 serverc '/bin/sh -c '"'"'echo PLATFORM; uname; echo FOUND; command -v '"'"'"'"'"'"'"'"'/usr/bin/python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.5'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/libexec/platform-python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/bin/python3'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python'"'"'"'"'"'"'"'"'; echo ENDFOUND && sleep 0'"'"''
<serverc> (0, b'PLATFORM\r\nLinux\r\nFOUND\r\n/usr/bin/python\r\n/usr/bin/python3.6\r\n/usr/bin/python2.7\r\n/usr/libexec/platform-python\r\n/usr/bin/python3\r\n/usr/bin/python\r\nENDFOUND\r\n', b'OpenSSH_7.9p1, LibreSSL 2.7.3\r\ndebug1: Reading configuration data /Users/vincent/.ssh/config\r\ndebug1: /Users/vincent/.ssh/config line 135: Applying options for serverc\r\ndebug1: /Users/vincent/.ssh/config line 183: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 48: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 3569\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to serverc closed.\r\n')
<serverc> ESTABLISH SSH CONNECTION FOR USER: root
<serverc> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/Users/vincent/.ansible/cp/b17d87e4d6 serverc '/bin/sh -c '"'"'/usr/bin/python && sleep 0'"'"''
Sth else that has happened to me is to have sth like:
- set_fact:
linux_cmd: >-
{{ executable }}
--flag1
--option1 argument
- command: "{{ linux_cmd }}"
When linux_cmd
is given that way, the command module silently fails.
It seems I have the same problem (version 2.4.2). I use jinja2 template module to copy a config file to 280 target hosts . For performance optimization, I change fork from 5 to 300. All subprocesses except one or two can finish successfully . Those subprocesses which can not finish are just stuck there without doing anything to the target hosts. And they cause the parent process dead loop and consum almost 90% CPU usage. I use strace with the parent and it say it is just running a lot of select() calls ...
Why the parent simply dead loop with select() without doing anything with the stuck subprocesses?
I use 2.8.5 and it gets stuck on gathering facts...
with verbose on, the final log is Escalation Succeeded.
IDK if it will help, but if you prepend ANSIBLE_DEBUG=1
to the execution, it will show even more verbose info, though I’m not sure it would reveal anything else helpful.
Andrew Cziryak cziryaka@oclc.org Systems Analyst (614) 446-5653
From: Alain P Igban notifications@github.com Reply-To: ansible/ansible reply@reply.github.com Date: Wednesday, October 23, 2019 at 6:57 AM To: ansible/ansible ansible@noreply.github.com Cc: "Cziryak,Andrew" cziryaka@oclc.org, Mention mention@noreply.github.com Subject: [External] Re: [ansible/ansible] ansible hangs indefinitely waiting for command termination when SSH connection is OK (#18305)
I use 2.8.5 and it gets stuck on gathering facts...
with verbose on, the final log is Escalation Succeeded.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ansible/ansible/issues/18305?email_source=notifications&email_token=AG5F34PYEMYBOTFICUKZI23QQAUZJA5CNFSM4CU4WJRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECA7JFA#issuecomment-545387668, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG5F34JYLZEYBNGWSLKL74TQQAUZJANCNFSM4CU4WJRA.
I am having the same issue.
What I have done is created a new playbook to attempt a python hello world to see what happens.
playbook:
- hosts: all
gather_facts: False
tasks:
- name: Test Hello_world
hello_world:
register: resule
- debug: var=result
hello_world.py
from ansible.module_utils.basic import *
def main():
module = AnsibleModule(argument_spec={})
theReturnValue = {"hello": "world"}
module.exit_json(changed=False, meta=theReturnValue)
if __name__ == ' __main__':
main()
it hangs just as other above, opens the SSH connection, appears to be running the code, however it never appears to complete/close. I have manually ran the code on the remote_host and get returned value as expected.
Ansible version 2.8 - AWX 6.0.0
Remote Host: RHEL 6.5 Python 2.6
I also hang on gather_facts as well, and I wanted to determine if the problem was with the gather_facts or something else, I believe it is something else, and I'm thinking it's the ansible.module_utils.basic
update on mine, because I am using AWX I have found the problem is with "become" when I set become: false in the playbook it works perfect when I use become: true it hangs, I believe this is because it is prompting that with AWX will never make it back.
man -e 1 timeout
i have the same behavior randomly on any tasks, not often, but it's clearly a handicap to have no options to handle a timeout.
Some of the hangs may be related to https://sourceforge.net/p/sshpass/patches/11/. This can occur when running ansible playbooks on a heavy loaded system (ie. high fork count) and using sshpass with control persist feature for ssh (default for most).
In my case it was the following: hosts whose user .bashrc (the user ansible uses to log in) had a "sudo" to another user in it. This caused Ansible to hang indefinitely.
This may not be super helpful, but 2.10 will introduce a task timeout feature which may help in the case where this may pop up with troublesome commands.
This may not be super helpful, but 2.10 will introduce a task timeout feature which may help in the case where this may pop up with troublesome commands.
This should help in the case where the .bashrc/.bash_profile/.login has a sudo
in it.
Or any other case that causes an indefinite/long process, the problem is that there are too many reasons this can happen and many of them are not discoverable, much less solvable, remotely from the controller.
The new timeout
keyword will help mitigate this issue, though it is not a fix, it at least gives users the option to continue the rest of the play if needed and abandon the execution after what they would consider a reasonable period.
I agree. This should help out tremendously. Thanks for the persistence with this issue!
I was running into this, but only on two boxes out of 80-ish.
What I found was that facter was hanging. It was also consistently hanging (24 instances of it, from 24 attempts on one box) at a state where it was in a non-interruptible disk-wait state, and the last file descriptor open was /proc/partitions. It was only hanging on boxes where someone had tried to disable the floppy driver, but had done so incorrectly. It would also only hang in gather_facts if set to yes, and -only- if become was set to yes. (Either being 'no' would not result in a hang.
Blacklisting the floppy driver solved the problem for me. At least so far. I'm pretty sure that's what was wrong on the two systems which were plaguing me with hangs in facter. Hard to tell 100%, but the diagnostics were the same on both, and they've both been fine since being rectified. However, a reboot has been known to fix this issue in the past, so I'm not 100% convinced, and won't be at ease for a while.
In my case Ansible was hanging forever (I guess, I only waited for one hour) to execute a command that only takes 2 seconds when executed manually.
The reason:
a playbook invoked with root
having a task using shell
module and becoming a less privileged user.
tasks:
- name: Non-root task running a quick node.js script
shell: "yarn my-script"
args:
executable: /bin/bash
chdir: "target-dir"
become: yes
become_user: "my-non-root-user"
- name: This task needs ROOT access that's why playbook is invoked with root user
shell: "my-executable-shell-script.sh"
args:
executable: /bin/bash
chdir: "target-dir"
I need to do this so the files touched/created by the task are owned by the actual user instead of being owned by `root`
**The workaround:**
- Running the playbook with the user account that should own the file to be created instead of invoking the playbook using the `root` user make the task pass. But, as other tasks in the same playbook (that should be executed as part of the same playbook) require `root` access, then the next tasks fail due to lack of privilege;
**Questions:**
1. Is `"de-escalating"` (sorry, couldn't think of a better word for that) from `root` to a regular user account a bad practice? Why? If so, could Ansible raise a warning and tell users about it?
2. Other [comments on this same thread](#issuecomment-550430394) points to the usage of `become` as the root cause. As `Ansible 2.10`will tackle this issue by adding the `timeout` keyword., would it be possible to the error output of a timed-out task also point to issues related to [in]correct(?) usage of `become`?
@Fairlight2cx so that is under generic 'fact gathering device issues' family of hangs, fact gathering tries to list current devices, many skipped as only root is allowed to query certain information.
@jfoliveira
In your case it happened just in that circumstance. But this still can happen under a normal user with a device or any other blocking call they have access to query and can be 'broken' or 'stuck' when you do so. My favorite example ls /badnfs
.
This is why setting up a automatic timeout just for become tasks would not be advisable ... also there are many cases in which a task is supposed to take a long time and this would require users to rewrite their playbooks to account for it.
In the end, this type of hang always requires a skilled admin to debug the cause as there are too many possible reasons and they all require different solutions.
Thanks Brian @bcoca ! I got the issue with long running processes and the timeout
feature will definitely be helpful.
What I'm not getting though is this very playbook/shell command only hanging when executed through Ansible as root
with become_user: "my-non-root-user-name"
set in my playbook.
Examples:
Works - playbook finishes in a few seconds, as expected, and STDERR of the failing shell command is displayed:
ansible-playbook my-playbook.yml --user=my-non-root-user-name
Hangs "forever" - I waited up to one hour, several times:
ansible-playbook my-playbook.yml --user=root
I'm giving lots of details below. Please let me know if this doesn't fit here, if I should submit another issue for that, and what extra info I can provide in order to help on a fix for it.
yarn build
manually in the target machine it fails in 12 (twelve) seconds - as expected as there is an actual error in my Node.js code that I set on purpose to ensure my playbook can report me the STDERR
output;ps aux
or top
commands, it seems Ansible-as-root is spawning a new process when the first one fails, and spawning a third process when the second one fails, ..., as there are many running process executing the same yarn build
command;ANSIBLE_DEBUG
or -vvvvvvv
are not telling anything other than regular SSH connection details;Ansible-as-root-user
I see many processes on the target machine with the same shell command;Question:
Is there any Ansible implicit setting that is forcing a keep retrying when process fails
when executing the playbook as root and becoming another user? I'd love to disable it.
Here is the playbook code:
---
- hosts: all
gather_facts: no
become: yes
become_user: "my-non-root-user-name"
tasks:
- name: Non-root task running a quick failing node.js command
shell: "yarn build"
args:
executable: /bin/bash
chdir: "target-dir"
Environment details that might be helpful:
OS: Ubuntu 18.04 LTS in both Control and Node/Target machines
Ansible - Version:
$ ansible-playbook --version
ansible-playbook 2.9.9
config file = /home/user/playbooks/ansible.cfg
configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]
Ansible - Configuration:
$ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/home/user/playbooks/ansible.cfg) = True
ANSIBLE_PIPELINING(/home/user/playbooks/ansible.cfg) = True
ANSIBLE_SSH_ARGS(/home/user/playbooks/ansible.cfg) = -C -o ControlPersist=60s -o PreferredAuthentications=publickey -o ControlMaster=auto -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
DEFAULT_CALLBACK_WHITELIST(/home/user/playbooks/ansible.cfg) = [u'json', u'timer']
DEFAULT_MANAGED_STR(/home/user/playbooks/ansible.cfg) = Project.com - File generated through http://project.com with
DEFAULT_ROLES_PATH(/home/user/playbooks/ansible.cfg) = [u'/home/user/user/playbooks/roles', u'/etc/ansible/roles', u'/home/user/.ansible/roles']
DEFAULT_STDOUT_CALLBACK(/home/user/playbooks/ansible.cfg) = debug
HOST_KEY_CHECKING(/home/user/playbooks/ansible.cfg) = False
INTERPRETER_PYTHON(/home/user/playbooks/ansible.cfg) = /usr/bin/python3
RETRY_FILES_ENABLED(/home/user/playbooks/ansible.cfg) = False
no, there is no implicit 'process retry' .. that is what until/retry is for, the ssh connection does have a 'connection retry' but that would not show on the remote if failing/retrying nor create new processes.
added new timeout
keyword in 2.10 (#69284 ) which acts as mitigation of these conditions. Ansible still cannot know if the remote is hung or just waiting for something to finish ,but the playbook author can now set a timeout that in their context will create a limit between both cases.
I also realize hang behavior after upgrading to 2.10. so first i thought it's something with the new version. I've downgrade to 2.9.13 and was surprised that this issue persists.
On every run it hangs on an different system and different task. So i've dig a little bit further and did some straces and i can see on the target system(s) the processes reading on stdin which is an pipe and goes up to the ssh connection waiting on data from master. So i guess it has something to do with the forks and reduce the forks from 5 to 1 and the playbooks completes (very slow 😄) ) without problems. Maybe this is an race condition or something like that and related to #59642?
In case of fork > 1, if i kill the ssh session i got different "exceptions":
fatal: [pandora]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo /tmp/.ansible `\"&& mkdir \"` echo /tmp/.ansible/ansible-tmp-1601029060.3310869-54420-202820423202194 `\" && echo ansible-tmp-1601029060.3310869-54420-202820423202194=\"` echo /tmp/.ansible/ansible-tmp-1601029060.3310869-54420-202820423202194 `\" ), exited with result -15", "unreachable": true}
fatal: [charon]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true}
After that the remaining playbook with the remaining hosts are complete.
OS: Arch Linux on controller + targets
Ansible Version:
ansible 2.10.1
config file = /home/daniel/repos/ansible/ansible.cfg
configured module search path = ['/home/daniel/repos/ansible/library', '/home/daniel/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.8/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.8.5 (default, Sep 5 2020, 10:50:12) [GCC 10.2.0]
Ansible - Configuration:
ansible-config dump --only-changed
[DEPRECATION WARNING]: DEFAULT_HASH_BEHAVIOUR option, This feature is fragile and not portable, leading to continual confusion and misuse , use the ``combine`` filter explicitly instead. This feature will be removed in version 2.13.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ANSIBLE_NOCOWS(/home/daniel/repos/ansible/ansible.cfg) = True
ANSIBLE_PIPELINING(/home/daniel/repos/ansible/ansible.cfg) = True
CACHE_PLUGIN(/home/daniel/repos/ansible/ansible.cfg) = jsonfile
CACHE_PLUGIN_CONNECTION(/home/daniel/repos/ansible/ansible.cfg) = cache/
CACHE_PLUGIN_TIMEOUT(/home/daniel/repos/ansible/ansible.cfg) = 900
DEFAULT_CALLBACK_WHITELIST(/home/daniel/repos/ansible/ansible.cfg) = ['profile_tasks']
DEFAULT_FORKS(/home/daniel/repos/ansible/ansible.cfg) = 1
DEFAULT_HASH_BEHAVIOUR(/home/daniel/repos/ansible/ansible.cfg) = merge
DEFAULT_HOST_LIST(/home/daniel/repos/ansible/ansible.cfg) = ['/home/daniel/repos/ansible/hosts']
DEFAULT_LOCAL_TMP(/home/daniel/repos/ansible/ansible.cfg) = /tmp/.ansible/ansible-local-48040jhby95fe
DEFAULT_MODULE_PATH(/home/daniel/repos/ansible/ansible.cfg) = ['/home/daniel/repos/ansible/library', '/home/daniel/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
DEFAULT_NO_TARGET_SYSLOG(/home/daniel/repos/ansible/ansible.cfg) = True
DEFAULT_TIMEOUT(/home/daniel/repos/ansible/ansible.cfg) = 5
DEFAULT_VAULT_PASSWORD_FILE(env: ANSIBLE_VAULT_PASSWORD_FILE) = /home/daniel/.tokens/.ansible_vault_token
INTERPRETER_PYTHON(/home/daniel/repos/ansible/ansible.cfg) = auto_silent
RETRY_FILES_ENABLED(/home/daniel/repos/ansible/ansible.cfg) = False
(yes i know DEFAULT_HASH_BEHAVIOUR is deprecated, but i come from saltstack and use my pillars 1:1 as host/group vars so it hurts to rewrite all the stuff 😱)
I am also experiencing this issue.
In my case the tasks are not long, but the playbook executions are (around 40 minutes): the use case is using ansible to run CI tests.
I dont target a lot of servers, but just 3.
IMHO the hangs are not completely aleatory but only pseudo-aleatory. For example It has been 3 or 4 hangs in a row on the same task.
The actual task is checking that a list with a few packages are installed (debian/apt) and they are installed, so it returns without changes in 2 of the 3 hosts.
Is very curious that this concrete task is stopping the execution and it is curios to see that on each execution the host where it happens changes. but the task is the same.
I suspect timings (because of the predictability), parallelisation (because of the fact that the host changes).
Also another thing is that my playbooks reboot the target machines 2 or 3 times. The failure is 2 or 3 tasks after a reboot, right now.
I hope this information helps to figure out what could it be.
My use case involves speaking to a large number of remote hosts (serialized to 20-25 hosts at a time) that can't be touched during business hours, so my playbook is called from a cron task and uses the slack module to send me updates for which hosts have been updated. The integrated webhook allows me to check on the status of these updates from my phone and allows me to work from home/afk much more easily :)
Something interesting I have noticed when this ssh timeout issue occurs; ansible seemingly process for each host it speaks to during any particular task. The host that it is stuck on will be listed as an active process on my ansible controller with the inventory hostname appended to it (Ex. ps aux | grep hostname.test.com
)
In my investigations so far, this process hangs for days at a time, but does not fail or timeout the ansible-playbook itself. I can manually kill the created process for the offending host on my ansible controller, and the playbook will continue as normal (while failing the task for the offending host).
My current plan is to determine an effective way of terminating these frozen processes when they are taking too long, as I have a good feel for how long each serialized run should take, which will then allow the playbook to continue. Since I have the privilege of searching for a particular string found in each instance of my inventory hostnames, I can locate the processes specific to the ansible hosts themselves and determine the pid from that. Another thing worth noting is typically I can attempt the same offending host a day or two later and find that the issue does not occur, so any host this occurs on can be attempted at a later time.
I think I can speak for everyone by saying I appreciate the info and input that everyone has added so far. I hope this information helps you in some way to determine a good work-around for this issue in your use case.
I'm having this issue also and it seems to come up on Solaris servers moreso than our Oracle Linux Servers. I've even bumped the timeout way up:
"msg": "Timeout (62s) waiting for privilege escalation prompt: "
It's not consistent either. Sometimes it'll work just fine across all hosts - other times a number will fail. I do have some larger playbooks running and in those I have told it not to gather facts - that seems to help with this a great deal as well. I'm on a new project with Ansible, so I'm kind of looking over this issue again. The environment I'm currently working on this project for isn't a prod environment, so I have a bit more room to 'test'.
I know it's not the playbook.. I have one specifically for testing connections:
---
- hosts: all
become: yes
tasks:
- name:
debug:
var: hostvars[inventory_hostname].ansible_hostname
It couldn't get more simple I suppose. Thing is - when I login using the Ansible account directly and use sudo su -, it works perfectly everytime. -c paramiko does seem to mitigate it to a large degree, but not always.
I did happen to notice though in the 'su-log' I found a couple entries similar to this.
sudo: [ID 183074 local2.notice] ansible : TTY=unknown
In most other cases, I get an actual number for the TTY session: sudo: [ID 183074 local2.notice] ansible : TTY=pts/2
Not all of them were that way. I wasn't tracking the exact time I got the sudo error, but I'm running that playbook in a loop on one of the servers that errored out to see if the 'TTY=unknown' coincides with a sudo failure.
Also - this is Solaris 5.11 11.4.26.75.4 - using sudo version 1.8.31.1-11.4.24.0.1.75.1
Edit/update: the Unknown TTY doesn't seem to appear on the most recent failures I ran into.
sudo su -
is not really supported by the sudo
become method.
ISSUE TYPE
COMPONENT NAME
ansible
ANSIBLE VERSION
ansible 2.3.0 (devel 5fa82f2b4e) last updated 2016/10/31 12:01:58 (GMT +200) lib/ansible/modules/core: (detached HEAD f9fada7db0) last updated 2016/10/31 12:02:21 (GMT +200) lib/ansible/modules/extras: (detached HEAD 5701a2e050) last updated 2016/10/31 12:02:41 (GMT +200) config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides
CONFIGURATION
OS / ENVIRONMENT
RHEL 7.2 (Maipo) Python 2.7.5 (default, Oct 11 2015, 17:47:16)
SUMMARY
Ansible Fails to Timeout when SSH conection is OK, but command executes during long (infinite) time. Common errors:
A command execution Timeout should be provided, like suggested here (adding timeout to subprocess): https://github.com/ansible/ansible/issues/9544
Something like this will be nice: *wait for 3600 seconds (or desired time) to finish task, if it stucks, send kill -9 to subprocess.
STEPS TO REPRODUCE
Target server: $ mv /usr/bin/rpm /usr/bin/rpm.orig
$ vi /usr/bin/rpm
!/bin/bash
while true; do sleep 5; done
$ chmod a+x /usr/bin/rpm
Control Server: *Launch ansible and try to remove any packet... ---> IT stucks <----
EXPECTED RESULTS
I expect task timeout and mark the host as failed
ACTUAL RESULTS
Ansible wait to task finalization during infinite time.
This impacts in several nightly support tasks, because playbooks not end... (requires manual intervention).