Rahix / tbot

Automation/Testing tool for Embedded Linux Development
https://tbot.tools
GNU General Public License v3.0
84 stars 21 forks source link

When running tests via SSH connector and SSH connection breaks due to device crash TBot hangs waiting indefinitely for SSH #73

Closed yanivy-nr closed 2 years ago

yanivy-nr commented 2 years ago

Hello,

Sometimes, when running tests on our device the device crashes and the SSH connection breaks. When this happens TBot waits Indefinitely. We are using the connector.SSHConnector connector, and have added the following options to ssh via ssh_config property: ['ConnectTimeout=5', 'ServerAliveInterval=1', 'ServerAliveCountMax=5']. With these options, SSH detects that the device has failed, but TBot does not detect it and continues to wait for an input from SSH.

Please see below an example of output. After "exit" is printed, TBot is waiting indefinitely:

│   ├─[local] ssh -o BatchMode=yes -o StrictHostKeyChecking=no -p 22 -o ConnectTimeout=5 -o ServerAliveInterval=1 -o ServerAliveCountMax=5 root@192.168.5.12
│   ├─[nr-sim-linux] uname -a
│   │    ## 14:54:13.717558    11 dwmac.cc:178] read to A-N registers not supported (0xc4))(will warn only once)
│   │    ## uname -a
│   │    ## Linux abc 5.18.0-g38e0cf23fe26 #1 SMP PREEMPT Thu Aug 11 09:07:41 UTC 2022 aarch64 GNU/Linux
│   ├─[nr-sim-linux-ssh] ls -l
│   │    ## Timeout, server 192.168.5.12 not responding.
│   │    ## exit

Is there a way to detect that the SSH connection lost and fail the test in such case?

thanks, Yaniv

Rahix commented 2 years ago

Hm, when ssh terminates due to the connection timing out, I would expect the entire channel over which the session ran to terminate as well. Evidently, this is not working here. I will have to investigate.

In the meantime, can you run the same command but with one more -v flag to show all channel communication? Please post the log here, maybe there is something more in there which could help...

Rahix commented 2 years ago

I pushed PR #74 which should be a fix for this. Please give it a try and tell me whether it actually helps...

yanivy-nr commented 2 years ago

Hi,

Thank you for the prompt response and for the fix, I can confirm it works now.

I am getting a ChannelClosedException when the SSH connection breaks and the tests fail.

Kind regards, Yaniv

venv/lib/python3.8/site-packages/tbot/machine/channel/subprocess.py:97: ChannelClosedException
============================================================ short test summary info ============================================================
FAILED devel/ssh_test.py::test_ssh_break - tbot.machine.channel.channel.ChannelClosedException
======================================================= 1 failed, 4 deselected in 31.93s ========================================================