canonical / checkbox

Checkbox
https://checkbox.readthedocs.io
GNU General Public License v3.0
30 stars 47 forks source link

Checkbox session stuck while running a beacon_eddystone tests #1263

Closed stanley31huang closed 3 months ago

stanley31huang commented 4 months ago

Bug Description

While running a beacon eddystone test, the checkbox session would hanging there over 15 minutes, I just kill the process and did not check is that process will stuck forever.

It seems the bluetooth adapter is in an abnormal stage on the system, so it leads the eddystone_scanner crashed, but the scripts suppose to terminated automatically within 10 minutes.

the hciconfig logs

iotuc@ubuntu:~$ hciconfig hci0
hci0:   Type: Primary  Bus: UART
        BD Address: 00:00:00:00:00:00  ACL MTU: 0:0  SCO MTU: 0:0
        DOWN 
        RX bytes:0 acl:0 sco:0 events:0 errors:0
        TX bytes:4 acl:0 sco:0 commands:1 errors:0

Error logs in a checkbox session.

==============[ Running job 1 / 1. Estimated time left: 0:00:10 ]===============
[ Test system can get beacon EddyStone URL advertisements on the hci0 adapter ]-
ID: com.canonical.certification::bluetooth4/beacon_eddystone_url_hci0
Category: com.canonical.plainbox::bluetooth
... 8< -------------------------------------------------------------------------
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/snap/checkbox-shiner/329/checkbox-runtime/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_support/vendor/beacontools/scanner.py", line 148, in run
    self.hci_version = self.get_hci_version()
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_support/vendor/beacontools/scanner.py", line 173, in get_hci_version
    resp = self.backend.send_req(self.socket, OGF_INFO_PARAM, OCF_READ_LOCAL_VERSION,
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_support/vendor/beacontools/backend/linux.py", line 23, in send_req
    return bluez.hci_send_req(socket, group_field, command_field, event, rlen, params, timeout)
_bluetooth.error: (100, 'Network is down')

Error logs while running eddystone_scanner directly

root@ubuntu:/home/iotuc# python3 /snap/checkbox20/current/lib/python3.8/site-packages/checkbox
_support/scripts/eddystone_scanner.py -D hci0
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/snap/checkbox-shiner/329/checkbox-runtime/usr/lib/python3.8/threading.py", line 932, 
in _bootstrap_inner
    self.run()
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_suppor
t/vendor/beacontools/scanner.py", line 148, in run
    self.hci_version = self.get_hci_version()
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_suppor
t/vendor/beacontools/scanner.py", line 173, in get_hci_version
    resp = self.backend.send_req(self.socket, OGF_INFO_PARAM, OCF_READ_LOCAL_VERSION,
  File "/snap/checkbox-shiner/329/checkbox-runtime/lib/python3.8/site-packages/checkbox_suppor
t/vendor/beacontools/backend/linux.py", line 23, in send_req
    return bluez.hci_send_req(socket, group_field, command_field, event, rlen, params, timeout
)
_bluetooth.error: (100, 'Network is down')

To Reproduce

I have reproduce this issue with following steps:

Notes: also seen this issue with a checkbox remote session.

Environment

Relevant log output

No response

Additional context

No response

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1449.

This message was autogenerated

stanley31huang commented 4 months ago

I found that when we catch other exceptions here, and the eddystone_scanner scripts will exits shortly with error. https://github.com/canonical/checkbox/blob/main/checkbox-support/checkbox_support/vendor/beacontools/scanner.py#L176

Hook25 commented 3 months ago

Thank you for your report. I now understand why this happens and why your fix works. The reason is that the exception is a custom un-picklable exception, so the decorator is unable to propagate it up, crashing silently (as it is doing so in a separate thread) when trying to push it. Subsequently the other process detects that the child is dead but not finding anything in the exception queue (as the child died while trying to push to it) it hangs waiting for a result. Will push a PR soon to fix this