canonical / checkbox

Checkbox
https://checkbox.readthedocs.io
GNU General Public License v3.0
30 stars 47 forks source link

LP1969519: Remote disconnects with no clue why #207

Open beliaev-maksim opened 1 year ago

beliaev-maksim commented 1 year ago

This issue was migrated from https://bugs.launchpad.net/checkbox-ng/+bug/1969519

Summary

Status Created on Heat Importance Security related
New 2022-04-19 18:28:36 6 Undecided False

Description

I run a lot of tests using testflinger, utilizing checkbox remote to run the tests from the agent to the SUT... I see this periodically and need some help understanding what's happening here and how to debug it:

============[ Bootstrap com.canonical.certification::device (1/3) ]============= ==========[ Bootstrap com.canonical.certification::executable (2/3) ]=========== =============[ Bootstrap com.canonical.certification::fwts (3/3) ]============== Connection lost! Service explicitly disconnected you. Possible reason: new remote connected to the service

I noticed, after logging in, that checkbox is still running tests: 838 ? Ssl 0:56 /usr/bin/python3 /usr/bin/checkbox-cli service 1096781 ? Ss 0:00 _ python3 /tmp/nest-bgvqbnev.08c2ccb537221bf98c3458f465623b26566529c0f432d78db84c14d94e64f338/disk_smart.py -b /dev/sdd -s 130 -t 530

though for some reason, it has also created three different sessions and two different "throwaway" sessions, all from a single test run: ubuntu@makrutlime:/var/tmp/checkbox-ng/sessions$ ll total 28 drwxrwxrwx 7 root root 4096 Apr 19 15:58 ./ drwxrwxrwx 3 root root 4096 Apr 19 15:46 ../ drwxrwxrwx 4 root root 4096 Apr 19 15:53 session_title-2022-04-19T15.46.41.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:58 session_title-2022-04-19T15.57.57.session/ drwxrwxrwx 4 root root 4096 Apr 19 18:25 session_title-2022-04-19T15.58.26.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:46 throwaway-2022-04-19T15.46.44.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:58 throwaway-2022-04-19T15.58.00.session/

So, for whatever reason it looks like something happens that severs the checkbox remote connection, and there's no attempt to re-establish it.

Something, I think is triggering the service to restart itself, and that is the cause of the "Service explicitly disconnected you" message. This doesn't happen all the time, but does happen often enough to be worth digging into.

Attachments

No attachments

Tags: []

beliaev-maksim commented 1 year ago

This thread was migrated from launchpad.net

https://launchpad.net/~bladernr wrote on 2022-04-19 18:32:01:

I was also able to reconnect manually but also ended up wtih a lot of disconnections... and when I tried again, I was unable to connect.

bladernr@galactica:~/development/kernels/ubuntu/bionic$ checkbox-cli remote 10.245.130.13 Connecting to 10.245.130.13:18871. Timeout: 600s Rejoined session. In progress: com.canonical.certification::disk/smart_sdd (70/89) Connection lost! connection closed by peer Reconnecting... Reconnecting... Reconnecting... Reconnecting... ^Cbladernr@galactica:~/development/kernels/ubuntu/bionic$ checkbox-cli remote 10.245.130.13 Connecting to 10.245.130.13:18871. Timeout: 600s ................

syncronize-issues-to-jira[bot] commented 6 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1326.

This message was autogenerated