Rahix / tbot

Automation/Testing tool for Embedded Linux Development
https://tbot.tools
GNU General Public License v3.0
84 stars 21 forks source link

Timeout when waiting for a temination of a subprocess is hardcoded #94

Closed yanivy-nr closed 11 months ago

yanivy-nr commented 1 year ago

Hello,

We are using a VM as a subprocess for testing our embedded system. When the tests complete and Tbot is trying to shut down the subprocess channel, it sends a TERM signal to the VM process and waits ~ 1.27 seconds for the VM process to exit.

In some occasions, it takes more than 1.27 seconds for the VM process to exit which triggers tbot.error.TbotException("some subprocess(es) did not stop") exception.

It would be great to be able to configure or override the timeout waiting for a subprocess to exit in def close(self) in tbot/machine/channel/subprocess.py.

thanks, Yaniv

Rahix commented 1 year ago

We can make this timeout configurable, but I think a better solution would be to actively shutdown the VM. This "kill all lingering subprocesses" code is mostly a safeguard and not really meant to be used as part of normal machine teardown.

Maybe you can post a rough sketch of what your machine config looks like for this VM machine?

yanivy-nr commented 1 year ago

Hello,

Thank you for the prompt response.

After further investigation, it looks like the issue is due to ssh process used by the SSHConnector is not always being terminated during close() in tbot/machine/channel/subprocess.py. The SSH process is a child process of bash. During the call to close() in subprocess.py, a TERM signal is sent to bash which is the parent process, but on some occasaions it is not propagated to the child processes.

I have written a code that will loop through the child processes and send TERM to each one. In case the child is still running after a few seconds, it will be sent a KILL signal. This code fixed the issue for us.

Will send a PR in the next few days (I would like to thoroughly test it in our CI prior sending the PR).

thanks, Yaniv

Mike8 commented 11 months ago

Hi. I faced a similar problem when using SSH connection to a board. I could solve it with the patch provided in pull request #104. Maybe it is similar/the same to what you had in mind above, @yanivy-nr.