dacr / jassh

High level scala SSH API for easy and fast operations on remote servers.
Apache License 2.0
71 stars 27 forks source link

Non-deterministic remote shell disconnect #21

Open thehungrysmurf opened 7 years ago

thehungrysmurf commented 7 years ago

I'm seeing sporadic Broken pipe disconnects when running an application which sends consecutive remote shell commands via this library. The disconnects are hard to pin down, they happen at a different step each time. Sometimes it can be as simple as a cat command on the remote server that makes the application crash. On the remote server I see little more than this:

sshd[16060]: fatal: Write failed: Broken pipe
sshd[16125]: fatal: Write failed: Connection reset by peer

Two questions for you:

Thanks!

dacr commented 7 years ago

I've already noticed very rare network like issues, I didnt yet state if it is related to the network, the openssh server, the jsch library or my code ; I've made some attemps to make the code more resilient, with transparently reconnection... By the past I was able to reduce some "session is down" issue this way : "jsch rekey operation disabled => it generates random "session is down" ssh error ! the same for ciphers...".

I'll investigate more in details as soon as I'll have some time for that, may be end of december.

If you use such calls : jassh.SSH.once("localhost", "test", "testtest") { ssh => print(sh.execute("""echo "Hello World from hostname" """)) } then no risk to have any connection leaks, this is something I'm using everyday, intensively without any problem.

david.

thehungrysmurf commented 7 years ago

I appreciate the response and suggestions @dacr

Having observed this longer to try to establish a pattern, the TCP disconnects seem to happen in two situations: 1. is still mysterious and there is no log trail to help investigate; 2. when a shell command hangs indefinitely, and in my case this command is (9 times out of 10) "cat /etc/redhat-release". Why this command, I don't know. I believe your suggestion to use .once might ameliorate this.

Please feel free to close the issue, as this issue is hard to pin down and probably will need more experimentation to figure out. Thanks for your help!