Checkbox is a testing framework used to validate device compatibility with Ubuntu Linux. It’s the testing tool developed for the purposes of the Ubuntu Certification program.
While testing #859, I came across the following issue, likely with Checkbox controller.
If the controller reconnects to an agent after the current job is finished, the session does not continue to the next job, and instead stay stuck to the output from the current job.
To Reproduce
Setup
If needed, here are the steps I followed to setup my device to easily reproduce this issue:
Steps to setup the Checkbox controller and agent as well as some sample jobs and test plan
## Checkbox controller
On my laptop, I already have a virtual environment setup for Checkbox. I just point to your branch:
```
(venv) $ git switch solve-resume-on-remote
```
I use this venv for the Checkbox controller.
## Checkbox agent
For the Checkbox agent, I create an LXC container running 22.04:
```
$ lxc launch images:ubuntu/22.04 jammy
$ lxc shell jammy
```
The rest of the commands are run in the container:
```
# apt install python3.10-venv python3-virtualenv git
# git clone https://github.com/canonical/checkbox.git
# cd checkbox/
# git switch solve-resume-on-remote
```
I follow the [Contrib guide](https://github.com/canonical/checkbox/blob/main/CONTRIBUTING.md#testing) to get Checkbox installed in a venv. In the end, checkbox-cli lives in `/root/checkbox/checkbox-ng/venv/bin/checkbox-cli` and the providers are in described in `/root/checkbox/checkbox-ng/venv/share/plainbox-providers-1`.
I put the following in `/etc/systemd/system/checkbox-ng.service`:
```
[Unit]
Description=Checkbox Remote Service
Wants=network.target
[Service]
ExecStart=/root/checkbox/checkbox-ng/venv/bin/checkbox-cli run-agent
SyslogIdentifier=checkbox-ng.service
Environment="XDG_CACHE_HOME=/var/cache/"
Environment="PROVIDERPATH=/root/checkbox/checkbox-ng/venv/share/plainbox-providers-1"
Restart=always
RestartSec=1
TimeoutStopSec=30
Type=simple
[Install]
WantedBy=multi-user.target
```
and I install the checkbox-ng service and start it:
```
# systemctl daemon-reload
# systemctl enable checkbox-ng.service
```
Now, everything is in place. I can start a remote session from the controller by running:
```
(venv) $ checkbox-cli control
```
## Sample jobs and test plan
In the 22.04 container, I create a new `pieq.pxu` file in `/root/checkbox/providers/base/units/` and put the following in it:
```
unit: job
id: pieq/test
command:
for i in $(seq 1 30);
do
echo "Iteration $i/30..."
sleep 1
done
flags: simple noreturn
unit: job
id: pieq/wrapup
command:
echo "Wrapping up..."
flags: simple
unit: test plan
id: pieq
_name: pieq
include:
pieq/test
pieq/wrapup
```
the `pieq/test` job will run for 30 seconds and will show the current status of the job, so it's handy to see what's going on. It has the `noreturn` flag, but of course you can remove this flag if you want to test other use cases.
I need to restart the systemd service, otherwise this test plan will not be visible to Checkbox:
```
# systemctl restart checkbox-ng.service
```
## Launcher
In order to simulate a non-interactive test run, I create the following launcher file (`pieq.launcher`):
```
[launcher]
launcher_version = 1
app_id = com.canonical.certification:PR859
stock_reports = text
[test plan]
unit = com.canonical.certification::pieq
forced = yes
[test selection]
forced = yes
[ui]
type = silent
[transport:outfile]
type = stream
stream = stdout
[exporter:text]
unit = com.canonical.plainbox::text
[report:screen]
transport = outfile
exporter = text
```
To run it from the controller side with:
```
(venv) $ checkbox-cli control pieq.launcher
```
Test
Reconnecting to agent after the controller stopped/crashed :x:
One of the issue this should fix is #22 , which mentions
While testing is ongoing, restart your host computer.
So:
Run Checkbox remote using the launcher, which starts pieq/test (which runs for 30 seconds):
(venv) $ checkbox-cli control <IP of my lxc container> pieq.launcher
→ The test starts running
Close the terminal where the controller is running. Wait for 30 seconds, then try reconnecting to the agent:
(venv) $ checkbox-cli control 10.146.223.75
$PROVIDERPATH is defined, so following provider sources are ignored ['/usr/local/share/plainbox-providers-1', '/usr/share/plainbox-providers-1', '/home/pieq/.local/share/plainbox-providers-1', '/var/tmp/checkbox-providers-develop']
Connecting to 10.146.223.75:18871. Timeout: 600s
Rejoined session.
In progress: com.canonical.certification::pieq/test (1/2)
Iteration 17/30...
Iteration 18/30...
Iteration 19/30...
Iteration 20/30...
Iteration 21/30...
Iteration 22/30...
Iteration 23/30...
Iteration 24/30...
Iteration 25/30...
Iteration 26/30...
Iteration 27/30...
Iteration 28/30...
Iteration 29/30...
Iteration 30/30...
aaaaaaaaand nothing happens. The session never goes on to the next job (pieq/wrapup), and never finishes. This is because the job has finished running by the time we reconnect to the agent.
Bug Description
While testing #859, I came across the following issue, likely with Checkbox controller.
If the controller reconnects to an agent after the current job is finished, the session does not continue to the next job, and instead stay stuck to the output from the current job.
To Reproduce
Setup
If needed, here are the steps I followed to setup my device to easily reproduce this issue:
Steps to setup the Checkbox controller and agent as well as some sample jobs and test plan
## Checkbox controller On my laptop, I already have a virtual environment setup for Checkbox. I just point to your branch: ``` (venv) $ git switch solve-resume-on-remote ``` I use this venv for the Checkbox controller. ## Checkbox agent For the Checkbox agent, I create an LXC container running 22.04: ``` $ lxc launch images:ubuntu/22.04 jammy $ lxc shell jammy ``` The rest of the commands are run in the container: ``` # apt install python3.10-venv python3-virtualenv git # git clone https://github.com/canonical/checkbox.git # cd checkbox/ # git switch solve-resume-on-remote ``` I follow the [Contrib guide](https://github.com/canonical/checkbox/blob/main/CONTRIBUTING.md#testing) to get Checkbox installed in a venv. In the end, checkbox-cli lives in `/root/checkbox/checkbox-ng/venv/bin/checkbox-cli` and the providers are in described in `/root/checkbox/checkbox-ng/venv/share/plainbox-providers-1`. I put the following in `/etc/systemd/system/checkbox-ng.service`: ``` [Unit] Description=Checkbox Remote Service Wants=network.target [Service] ExecStart=/root/checkbox/checkbox-ng/venv/bin/checkbox-cli run-agent SyslogIdentifier=checkbox-ng.service Environment="XDG_CACHE_HOME=/var/cache/" Environment="PROVIDERPATH=/root/checkbox/checkbox-ng/venv/share/plainbox-providers-1" Restart=always RestartSec=1 TimeoutStopSec=30 Type=simple [Install] WantedBy=multi-user.target ``` and I install the checkbox-ng service and start it: ``` # systemctl daemon-reload # systemctl enable checkbox-ng.service ``` Now, everything is in place. I can start a remote session from the controller by running: ``` (venv) $ checkbox-cli controlTest
Reconnecting to agent after the controller stopped/crashed :x:
One of the issue this should fix is #22 , which mentions
So:
pieq/test
(which runs for 30 seconds):→ The test starts running
aaaaaaaaand nothing happens. The session never goes on to the next job (
pieq/wrapup
), and never finishes. This is because the job has finished running by the time we reconnect to the agent.Environment
main
Relevant log output
No response
Additional context
No response