ParallelSSH / parallel-ssh

Asynchronous parallel SSH client library.
https://parallel-ssh.org
GNU Lesser General Public License v2.1
1.21k stars 149 forks source link

ParallelSSHClient: timeout option with proxy host #76

Closed fsaintma closed 7 years ago

fsaintma commented 7 years ago

Hi,

I have just tried to test with a Python script ParallelSSHClient and run_command with approximately 50 hosts behind a proxy host.

I've realized that if you had one host in your host list which didn't boot correctly but you still establish a blocking TCP connexion the run_command return the output after bit more two minutes (eg. raise exception Connect Failed). So I decided to pass timeout option to the client with 10 seconds but the behaviour is the same and I must wait two minutes. So I pushed my script on the proxy host and launch ParallelSSHClient without proxy_host option and the output is returned after 10 seconds with an exception as expected (eg. timed out) .

Can you confirm me this behaviour and how I should apply a timeout for my hosts list SSH connection when I specify a proxy_host ? Indeed I don't want to wait two minutes in the case of all hosts boot correctly except one.

You can see below my script and the result of the tests with one host.

Thanks in advance for your help

NOTE1 : If I want a timeout of 10 seconds I must specify .10 (with 10 I must wait 40 seconds). NOTE2 : I notice in your documentation that you have a timeout=120 in ParallelSSHClient method and a typo in docstring with "Defaults to 10".

script

#!/usr/bin/env python

from __future__ import print_function
import sys
import time
from pssh import ParallelSSHClient

def main(args):
    start_time = time.time()
    if len(args) > 1:
        client = ParallelSSHClient([args[0]],
                                   user='root',
                                   proxy_host=args[1],
                                   proxy_user=args[2],
                                   timeout=.10,
                                   channel_timeout=.10)
    else:
        client = ParallelSSHClient(args,
                                   user='root',
                                   timeout=.10,
                                   channel_timeout=.10)
    output = client.run_command("uptime", stop_on_errors=False)
    end_time = time.time() - start_time
    print('run_command duration : %d seconds' % end_time)
    print(output)

if __name__ == '__main__':
    main(sys.argv[1:])

with_proxy : script launch on my computer

my_computer:~$ time ssh -o ConnectTimeout=10 -o "ProxyCommand=ssh grenoble.iot-lab.info -W %h:%p" root@node-a8-111
Connection timed out during banner exchange

real    0m10.021s
user    0m0.012s
sys 0m0.000s

my_computer:~$ ./test_pssh.py node-a8-111 grenoble.iot-lab.info <proxy_user>
run_command duration : 127 seconds
{'node-a8-111': 
    host=node-a8-111
    cmd=<Greenlet at 0x7f2db08a39b0>
    channel=None
    stdout=None
    stderr=None
    stdin=None
    exception=("Error connecting to host '%s:%s' - %s", 'node-a8-111', 22, 'Connect failed')
}

without_proxy: script launch on the proxy host

proxy_host:~$ time ssh -o ConnectTimeout=10 root@node-a8-111
ssh: connect to host node-a8-111 port 22: Connection timed out

real    0m10.022s
user    0m0.004s
sys 0m0.004s

proxy_host:~$ ./test_pssh.py node-a8-111
run_command duration : 10 seconds
{'node-a8-111': 
    host=node-a8-111
    cmd=<Greenlet at 0x7f2429667050>
    channel=None
    stdout=None
    stderr=None
    stdin=None
    exception=("Error connecting to host '%s:%s' - %s - retry %s/%s", 'node-a8-111', 22, 'timed out', 3, 3)
}
cladmi commented 7 years ago

@fsaintma and dear Maintainer

I tried to reproduce a host that could have this behavior. I think it is trying to connect to a host that is broken and where the TCP connection does not send TCP SYN ACK.

This can be achieved by setting an iptable that drops tcp packets on a port: http://unix.stackexchange.com/a/330537

You can try with our 'grenoble.iot-lab.info' server on port 22222 for example. (Or any port that is not open, == not 22 and not 2222)

With proxy trough localhost

time ssh -oConnectTimeout=5 -oProxyCommand='ssh localhost -W %h:%p' grenoble.iot-lab.info -p 22222
Connection timed out during banner exchange

real    0m5.012s
user    0m0.000s
sys 0m0.007s

Without proxy

time ssh -oConnectTimeout=5 grenoble.iot-lab.info -p 22222
ssh: connect to host grenoble.iot-lab.info port 22222: Connection timed out

real    0m5.012s
user    0m0.007s
sys 0m0.000s

I did not run ParallelSSH code, I just tried to reproduce the output with openssh-client to make it easily testable.

pkittenis commented 7 years ago

Hi, thanks for raising. Yes, looks like timeout parameter is not passed onto the proxy client. Simple fix fortunately.

Will also update the docstring regarding default timeout setting.

For the other comment, .10 to get 10 seconds, I think it is the retry mechanism that is adding the additional 30sec delay (default for num_retries is 3). If the client is run with retries set to 1 it should give expected timeout for the one connection attempt.

Eg:

from __future__ import print_function
from datetime import datetime
client = ParallelSSHClient(['11.234.34.34'], num_retries=1, timeout=5)

start = datetime.now() 
client.run_command('asdf', stop_on_errors=False)
end = datetime.now()
print(end-start)

Output: 0:00:05.004444

There is also a 5 second delay between connection attempts, so for retries=2 with timeout=5 total time before timeout would be 15sec.

This was not made clear in documentation though so will update accordingly.

pkittenis commented 7 years ago

Hmm, forget what I said, looks like timeout is in fact passed to proxy and works as expected.

With unreachable proxy and timeout set:

from __future__ import print_function

from datetime import datetime
from pssh import ParallelSSHClient

client = ParallelSSHClient(['localhost'], proxy_host='13.123.132.2', 
                           timeout=1, num_retries=1)
now = datetime.now()
print(client.run_command('uname', stop_on_errors=False))
end = datetime.now()
print(end-now)

Output:

<..>
    exception=("Error connecting to host '%s:%s' - %s - retry %s/%s", 
                   'localhost', 22, 'timed out', 1, 1)
<..>
0:00:01.002312

Without unreachable proxy the above completes without errors.

So the additional delay you are seeing is coming from the retry attempts, not proxy timeout not being set.

Documentation update for the retry behaviour has been made.

pkittenis commented 7 years ago

Thanks @cladmi, there is a test for proxy failures already :)

fsaintma commented 7 years ago

Hi @pkittenis and @cladmi

I'm sorry but if I use your script with my unreachable host node-a8-111.grenoble.iot-lab.info I obtain the same result. Indeed in my case the proxy host is reachable but not the host behind the proxy host (eg. blocked TCP connection)

from __future__ import print_function

from datetime import datetime
from pssh import ParallelSSHClient

client = ParallelSSHClient(['node-a8-111.grenoble.iot-lab.info'],
                           proxy_host='grenoble.iot-lab.info',
                           proxy_user='my_login',
                           timeout=1,
                           user='root',
                           num_retries=1)
now = datetime.now()
print(client.run_command('uname', stop_on_errors=False))
end = datetime.now()
print(end-now)

Output

{'node-a8-111.grenoble.iot-lab.info': 
    host=node-a8-111.grenoble.iot-lab.info
    cmd=<Greenlet at 0x7fd0683d39b0>
    channel=None
    stdout=None
    stderr=None
    stdin=None
    exception=("Error connecting to host '%s:%s' - %s", 'node-a8-111.grenoble.iot-lab.info', 22, 'Connect failed')
}
0:02:08.040653

So it's not a proxy failure but a host failure behind a proxy.

Thanks in advance.

pkittenis commented 7 years ago

Ah, right so the behaviour is when proxy is reachable but remote host is not.

In that case, the timeout setting for proxy -> host connection is not configurable on our client side as that is a proxy server side connection made by the SSH server on proxy host. Client side retries are not applicable either, proxy host may be doing that also.

ParallelSSH, and any client for that matter, can only affect its own connections - the proxy initiated connection is not one of them.

That timeout configuration can only be changed on the proxy host itself.

Eg:

In [2]: client = ParallelSSHClient(['11.213.132.3'], proxy_host='localhost', timeout=1, num_retries=1)

In [3]: now = datetime.now()
   ...: print(client.run_command('uname', stop_on_errors=False))
   ...: end = datetime.now()
   ...: print(end-now)
   ...: 
0:02:07.415910

Proxy host is localhost and reachable. Target host does not exist.

Changing localhost SSH settings to lower timeout changes this:

{'11.213.132.3': 
    host=11.213.132.3
    cmd=<Greenlet at 0x7f7423d8e730>
    channel=None
    stdout=None
    stderr=None
    stdin=None
    exception=("Error connecting to host '%s:%s' - %s", '11.213.132.3', 22, 'Connect failed')
}
0:00:01.211800

In short, will need to set an appropriate ssh config on the proxy server for the desired timeout.

cladmi commented 7 years ago

Thank you for your quick answers.

From what I understand, I don't think its a configuration that should be done on the server.

For me what happens in practice is

Tell me if I'm wrong, or if it is different with paramiko as I know more openssh-client.

So, I would think that the timeout configuration should be on this intermediate tunnel. Even before -W option was added to openssh, ProxyCommand was using netcat HOST PORT to create the connection.

So maybe when creating the proxy_channel here:

https://github.com/ParallelSSH/parallel-ssh/blob/master/pssh/ssh_client.py#L147

I may be missing what the intermediate connection is really made of in paramiko, and when it's actually trying to connect to the remote host. Is it a creation or when the proxy connects ? How do you get feedback on this connection ?

I will try tweaking this.

If there is no way to control this using tcp configuration, it would also be possible, but more complex and heavier, to have a local timer/thread/greenlet/process/signal/…, that closes the whole connection if it is not established after timeout. But I would rather find a solution without this.

cladmi commented 7 years ago

Indeed, adding "timeout=self.timeout" to open_channel does raise an exception in SSHClient:

You need a "recent" enough version of paramiko that supports this option. For example 1.15.2 did not and I had to upgrade.

Traceback (most recent call last):
  File "./test_client.py", line 20, in <module>
    num_retries=1)
  File "/home/harter/tmp/test_parallel_ssh/pssh/ssh_client.py", line 127, in __init__
    self._connect_tunnel()
  File "/home/harter/tmp/test_parallel_ssh/pssh/ssh_client.py", line 149, in _connect_tunnel
    timeout=self.timeout)
  File "/home/harter/.local/lib/python2.7/site-packages/paramiko/transport.py", line 841, in open_channel
    raise SSHException('Timeout openning channel.')
paramiko.ssh_exception.SSHException: Timeout openning channel.

However it is not currently handled in ParallelSSHClient:

./test_client.py 
No handlers could be found for logger "pssh"
{}
0:00:01.594286

With this however, the connection time could be 2 x timeout in the end as there are two connections. So maybe the second timeout for connecting to the host should be (timeout - elapsed_time).

cladmi commented 7 years ago

If you want we can do a PR with propositions to fix this. Just tell us if there would be things to respect.

Like if you want more (timeout - elapsed_time), or a new argument like proxy_timeout, or just ignore it as it's already the case when creating the proxy connection. Or constraints on how to present the error.

pkittenis commented 7 years ago

That's not quite what happens but yes, looks like there is now a timeout setting on the channel which can be set by client. It's a relatively small change, will need a bump in minimum paramiko version though.

What happens in practice on SSH protocol tunnel connections is

This is the 'direct-tcpip tunneling' feature of the SSH(v2) protocol

In the case of using the tunnel as a proxy to another SSH host, the tunnel connection is then used by client to initiate a new SSH channel to the remote host via the intermediate proxy.

ProxyCommand is an openssh client feature to use an external program to proxy through, not related here as ParallelSSHClient uses the SSH protocol's native tunneling feature instead.

The tunnel connection is initiated and managed by the SSH server. See for example connections created by:

client = ParallelSSHClient(['11.213.132.3'], proxy_host='localhost', timeout=1, num_retries=1)
client.run_command('blah')

$ netstat -anp | grep  22

Client -> proxy server

tcp        0      0 127.0.0.1:42522         127.0.0.1:22            ESTABLISHED 11115/python    

Proxy server -> remote host

tcp        0      1 10.14.33.xx:55689      11.213.132.3:22         SYN_SENT    -               

As can be seen above, the remote host connection is coming from the SSH server's IP rather than client. It is timing out waiting for a syn/ack reply from remote host which does not exist, as expected. Changing timeout settings on the SSH server does affect this connection's timeout.

However, it is now possible for timeout to be passed on to server when initiating the tunnel connection which is easier and good to have.

Eg with timeout=self.timeout on the tunnel connection like you showed:

now = datetime.now()
print(client.run_command('uname', stop_on_errors=False))
end = datetime.now()
print(end-now)
{}
0:00:01.098822

This will need some additional error handling though as previously any errors would happen when initiating the SSH connection via the tunnel, not when creating tunnel itself. That breaks output, note it is empty above.

Thanks for the info @cladmi, was not aware of that new setting :)

If you'd like to make changes needed would be happy to review, otherwise can wait for next release - it's not a big change so no problem either way.

What is needed is I think:

The first three are in tunnel_timeout branch already, can use that to test if it resolves your issue and/or branch off of.

Existing timeout setting is fine, don't need new one. By handling tunnel timeout on tunnel channel creation the code doesn't try to create a new ssh connection with its own timeout so no 2x timeouts.

fsaintma commented 7 years ago

@pkittenis and @cladmi

Congratulations !!!!!

I have just tested the tunnel_timeout branch and it's OK for me, it resolves my issue.

Script output (timeout=15)

Connecting via SSH proxy grenoble.iot-lab.info:22 -> node-a8-111.grenoble.iot-lab.info:22 {'node-a8-111.grenoble.iot-lab.info': host=node-a8-111.grenoble.iot-lab.info exit_code=None cmd=<Greenlet at 0x7f7195602cd0> channel=None stdout=None stderr=None stdin=None exception=("Error connecting to host '%s:%s' - %s", 'node-a8-111.grenoble.iot-lab.info', 22, 'Timeout openning channel.') } 15.2399671078

Thanks very much.

cladmi commented 7 years ago

Thank you

pkittenis commented 7 years ago

Thank you also for the info :)

fsaintma commented 7 years ago

Thanks very much @pkittenis for your work and reactivity !!! Just another service :) I have just seen that you have published a release candidate version (1.0.1) three days ago with the fix issue but it's not the case on PyPi (eg. 1.0.0). So should you update the release candidate on PyPi ? Thanks in advance.

fsaintma commented 7 years ago

Hi @pkittenis It's ok for Pypi and long life to parallel-ssh !!!

pkittenis commented 7 years ago

No worries, thanks for letting me know. The pypi release from travis failed for some reason.