Open beliaev-maksim opened 1 year ago
This thread was migrated from launchpad.net
I was also able to reconnect manually but also ended up wtih a lot of disconnections... and when I tried again, I was unable to connect.
bladernr@galactica:~/development/kernels/ubuntu/bionic$ checkbox-cli remote 10.245.130.13 Connecting to 10.245.130.13:18871. Timeout: 600s Rejoined session. In progress: com.canonical.certification::disk/smart_sdd (70/89) Connection lost! connection closed by peer Reconnecting... Reconnecting... Reconnecting... Reconnecting... ^Cbladernr@galactica:~/development/kernels/ubuntu/bionic$ checkbox-cli remote 10.245.130.13 Connecting to 10.245.130.13:18871. Timeout: 600s ................
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1326.
This message was autogenerated
This issue was migrated from https://bugs.launchpad.net/checkbox-ng/+bug/1969519
Summary
Description
I run a lot of tests using testflinger, utilizing checkbox remote to run the tests from the agent to the SUT... I see this periodically and need some help understanding what's happening here and how to debug it:
============[ Bootstrap com.canonical.certification::device (1/3) ]============= ==========[ Bootstrap com.canonical.certification::executable (2/3) ]=========== =============[ Bootstrap com.canonical.certification::fwts (3/3) ]============== Connection lost! Service explicitly disconnected you. Possible reason: new remote connected to the service
I noticed, after logging in, that checkbox is still running tests: 838 ? Ssl 0:56 /usr/bin/python3 /usr/bin/checkbox-cli service 1096781 ? Ss 0:00 _ python3 /tmp/nest-bgvqbnev.08c2ccb537221bf98c3458f465623b26566529c0f432d78db84c14d94e64f338/disk_smart.py -b /dev/sdd -s 130 -t 530
though for some reason, it has also created three different sessions and two different "throwaway" sessions, all from a single test run: ubuntu@makrutlime:/var/tmp/checkbox-ng/sessions$ ll total 28 drwxrwxrwx 7 root root 4096 Apr 19 15:58 ./ drwxrwxrwx 3 root root 4096 Apr 19 15:46 ../ drwxrwxrwx 4 root root 4096 Apr 19 15:53 session_title-2022-04-19T15.46.41.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:58 session_title-2022-04-19T15.57.57.session/ drwxrwxrwx 4 root root 4096 Apr 19 18:25 session_title-2022-04-19T15.58.26.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:46 throwaway-2022-04-19T15.46.44.session/ drwxrwxrwx 4 root root 4096 Apr 19 15:58 throwaway-2022-04-19T15.58.00.session/
So, for whatever reason it looks like something happens that severs the checkbox remote connection, and there's no attempt to re-establish it.
Something, I think is triggering the service to restart itself, and that is the cause of the "Service explicitly disconnected you" message. This doesn't happen all the time, but does happen often enough to be worth digging into.
Attachments
No attachments
Tags: []