canonical / checkbox

Checkbox
https://checkbox.readthedocs.io
GNU General Public License v3.0
30 stars 46 forks source link

Checkbox fails to generate the HTML and Tarball for a session #248

Open beliaev-maksim opened 1 year ago

beliaev-maksim commented 1 year ago

This issue was migrated from https://bugs.launchpad.net/checkbox-ng/+bug/1992151

Summary

Status Created on Heat Importance Security related
Triaged 2022-10-07 08:47:33 6 Undecided False

Description

[Summary] After Checkbox finishes it is supposed to generate a HTML, XML and tar.xz of the test data and logs.

However after finishing the checkbox-iiotg-classic 20.04 Manual for Server test plan, Checkbox encountered an error and was only able to generate the XML file. Since the session is now closed, the test data was lost and the tests will have to be rerun.

The session was then modified to still show as "open" and the report generation was tried again but it still failed.

[Steps to reproduce] Run Server Manual.

[Expected result] Checkbox should properly generate the 3 test files.

[Actual result] JSON errors prevent the generation of the HTML and tarball.

[Failure rate] 2/2 100%

[Additional information] CID: 202109-29496 SKU: system-manufacturer: AAEON system-product-name: UPN-EHL01 bios-version: UNEHAM11 CPU: Intel Atom(R) x6425RE Processor @ 1.90GHz (4x) GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4571] (rev 01) kernel-version: 5.15.0-1016-intel-iotg

[Stage] Issue reported and logs collected at a later stage

Attachments

checkbox-session.tgz snap_list.log acpidump.log sosreport-u-2022-10-07-rcwwztw.tar.gz Screenshot of checkbox exceptions while generating report session_title-2022-10-25T11.16.47.session.tar.xz

Tags: ['checkbox', 'ihv-intel', 'test-case']

beliaev-maksim commented 1 year ago

This thread was migrated from launchpad.net

https://launchpad.net/~baconyao wrote on 2022-10-07 08:48:11:

Automatically attached

https://launchpad.net/~baconyao wrote on 2022-10-07 08:48:13:

Automatically attached

https://launchpad.net/~baconyao wrote on 2022-10-07 08:48:26:

Automatically attached

https://launchpad.net/~baconyao wrote on 2022-10-07 08:48:29:

Automatically attached

https://launchpad.net/~huntu207 wrote on 2022-10-27 05:39:02:

The same symptom observed on PC project. However, it only failed on client-cert-desktop-22-04-stress, but passed on client-cert-desktop-22-04-automated

CID: 202109-29488 SKU: Image: dell-bto-jammy-jellyfish-corsola-X42-20221020-12.iso system-manufacturer: Dell Inc. bios-version: 0.13.50 CPU: Genuine Intel(R) w9-3495 CPU @ 1.80GHz (112x) GPU: 0000:ae:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:7422] kernel-version: 5.14.0-9040-oem

djacobs98 commented 1 year ago

This is still happening. I've had it happen two time in a row on a machine running the IOTG Jammy Desktop Manual test plan.

☑ : Device Check ☑ : Collect information about dpkg version ☑ : Collect information about installed system (lsb-release) ☑ : Collect information about the CPU ☑ : Attaches json dumps of system info tools ☑ : Attach PCI configuration space hex dump ☑ : Resource to detect if dmi data is present ☑ : Attach a copy of /sys/class/dmi/id/ ☑ : Collect information about kernel modules ☑ : Collect information about system memory (/proc/meminfo) ☑ : Create resource info for environment variables ☑ : Attaches json dumps of udev_resource.py ☑ : Collect information about installation media (casper) ☑ : Attaches json dumps of installed dkms package information. ☑ : Collect information about the EFI configuration ☑ : Attach the contents of /etc/modprobe. ☑ : Collect information about installed software packages ☑ : Enumerate available system executables ☑ : Attach info block devices and their mount points ☑ : Attach dump of udev database ☑ : Attach detailed sysfs property output from udev ☑ : Attach a copy of /proc/cmdline ☑ : Attaches json dumps of raw dmi devices ☑ : Collect information about hardware devices (DMI) ☑ : Provide links to requirements documents ☑ : Collect information about installed snap packages ☑ : Collect information about hardware devices (udev) ☑ : Collect information about the running kernel ☑ : Check that data for a complete result are present ☑ : System boot-up performance statistics ☒ : Test that the watchdog module can trigger a system reset ☑ : Hardware Manifest ☐ : Detect if the USB DWC3 module is loaded ☐ : Detect if the USB DWC3 drivers are loaded ☐ : Check DUT can be detected as mass storage device ☑ : Cleanup mass storage setup after mass storage device test ☑ : Check if TSN is supported ☑ : Verify Ethernet interface enp1s0 can establish clock sync ☐ : audio/speaker-headphone-plug-detection ☐ : audio/microphone-plug-detection ☐ : audio/list_devices ☐ : audio/playback_headphones ☐ : audio/alsa_record_playback_external ☐ : audio/playback_auto ☐ : audio/alsa_record_playback_internal ☒ : audio/channels ☐ : audio/external-linein ☐ : audio/external-lineout ☐ : bluetooth/detect-output ☐ : bluetooth/audio-a2dp ☐ : bluetooth4/HOGP-mouse ☐ : bluetooth4/HOGP-keyboard ☐ : bluetooth/audio_record_playback ☐ : Storage insert detection on Thunderbolt 3 port ☐ : Storage test on Thunderbolt 3 ☐ : Storage removal detection on Thunderbolt 3 port ☑ : monitor/1_powersaving_PCI_ID_0x46d1 ☐ : power-management/light_sensor ☐ : monitor/1_dim_brightness_PCI_ID_0x46d1 ☐ : Creates display resource info from xrandr output ☐ : monitor/1_displayport_PCI_ID_0x46d1 ☐ : audio/1_playback_displayport_PCI_ID_0x46d1 ☑ : Display connected via DisplayPort using an USB Type-C port for Intel Corporation PCI ID 0x46d1 ☐ : audio/1_playback_type-c_displayport_PCI_ID_0x46d1 ☑ : Display connected via HDMI using an USB Type-C port for Intel Corporation PCI ID 0x46d1 ☐ : audio/1_playback_type-c_hdmi_PCI_ID_0x46d1 ☐ : Display connected via VGA using an USB Type-C port for Intel Corporation PCI ID 0x46d1 ☐ : monitor/1_dvi_PCI_ID_0x46d1 ☐ : monitor/1_hdmi_PCI_ID_0x46d1 ☐ : audio/1_playback_hdmi_PCI_ID_0x46d1 ☐ : Display connected via Thunderbolt 3 for Intel Corporation PCI ID 0x46d1 ☐ : audio/1_playback_thunderbolt3_PCI_ID_0x46d1 ☐ : Daisy-chain testing for Thunderbolt 3 storage and display device ☐ : monitor/1_vga_PCI_ID_0x46d1 ☐ : monitor/1_multi-head_PCI_ID_0x46d1 ☐ : miscellanea/chvt ☑ : Test maximum supported resolution for Intel Corporation PCI ID 0x46d1 ☐ : Test that glxgears works for Intel Corporation PCI ID 0x46d1 ☐ : Test that glxgears works on fullscreen for Intel Corporation PCI ID 0x46d1 ☑ : Test rotation for Intel Corporation PCI ID 0x46d1 ☐ : Test that video can be displayed with Intel Corporation PCI ID 0x46d1 ☒ : Test that VESA drivers are not in use ☐ : Test resolution cycling for Intel Corporation PCI ID 0x46d1 ☐ : input/accelerometer ☑ : input/keyboard ☐ : Gathers information about each disk detected ☐ : disk/hdd-parking ☐ : Check if at least one fingerprint reader is detected ☐ : Enroll a fingerprint ☐ : Fingerprint negative match ☐ : Fingerprint positive match ☐ : Fingerprint unlock screen ☐ : Remove existing fingerprint signatures ☐ : Test that insertion of an SDHC card is detected ☐ : Test reading & writing to a SDHC Card ☐ : Test that removal of an SDHC card is detected ☑ : Ensure hotplugging works on port enp1s0 ☑ : Ensure hotplugging works on port enp0s30f4 ☑ : Network Information of device 1 (enp1s0) ☑ : Network Information of device 2 (enp0s30f4) ☐ : touchpad/basic ☐ : touchpad/palm-rejection ☐ : touchpad/continuous-move ☐ : touchpad/singletouch-selection ☐ : touchpad/drag-and-drop ☐ : touchpad/multitouch-rightclick ☐ : touchpad/multitouch ☐ : touchscreen/drag-n-drop ☐ : Check touchscreen pinch gesture for zoom ☐ : Check touchscreen pinch gesture for rotate ☑ : Display USB devices attached to SUT ☑ : usb/HID ☑ : USB 2.0 storage device insertion detected ☑ : USB 2.0 storage device read & write works ☑ : USB 2.0 storage device removal detected ☑ : Collect information about supported types of USB ☑ : USB 3.0 storage device insertion detected ☑ : USB 3.0 storage device read & write works ☑ : USB 3.0 storage device removal detected ☑ : USB HID work on USB Type-C port using a "USB Type-C to Type-A" adapter ☑ : usb-c/c-to-a-adapter/insert ☑ : usb-c/c-to-a-adapter/storage-automated ☑ : usb-c/c-to-a-adapter/remove ☐ : USB HID work on USB Type-C port ☑ : USB 3.0 storage device insertion detected on USB Type-C port ☑ : USB 3.0 storage device read & write works on USB Type-C port ☑ : USB 3.0 storage device removal detected on USB Type-C port ☑ : Check if USB Type-C to Ethernet adapter works ☑ : Creates resource info for RTC ☑ : Create resource info for supported sleep states ☑ : Automated test of suspend function ☐ : power-management/light_sensor after suspend (S3) ☐ : Daisy-chain testing for Thunderbolt 3 storage and display device after suspend (S3) ☐ : power-management/lid ☐ : power-management/lid_close ☐ : power-management/lid_open ☑ : Test display function after suspend for Intel Corporation PCI ID 0x46d1 ☐ : Test that glxgears works for Intel Corporation PCI ID 0x46d1 after suspend ☑ : Test rotation for Intel Corporation PCI ID 0x46d1 after suspend ☐ : Test that video can be displayed after suspend with Intel Corporation PCI ID 0x46d1 ☐ : suspend/1_cycle_resolutions_after_suspend_PCI_ID_0x46d1_graphics ☐ : suspend/1_xrandr_screens_after_suspend.tar.gz_auto ☑ : monitor/1_powersaving_PCI_ID_0x46d1 after suspend (S3) ☐ : monitor/1_dim_brightness_PCI_ID_0x46d1 after suspend (S3) ☐ : monitor/1_displayport_PCI_ID_0x46d1 after suspend (S3) ☐ : audio/1_playback_displayport_PCI_ID_0x46d1 after suspend (S3) ☑ : Display connected via DisplayPort using an USB Type-C port for Intel Corporation PCI ID 0x46d1 after suspend (S3) ☐ : audio/1_playback_type-c_displayport_PCI_ID_0x46d1 after suspend (S3) ☑ : Display connected via HDMI using an USB Type-C port for Intel Corporation PCI ID 0x46d1 after suspend (S3) ☐ : audio/1_playback_type-c_hdmi_PCI_ID_0x46d1 after suspend (S3) ☐ : Display connected via VGA using an USB Type-C port for Intel Corporation PCI ID 0x46d1 after suspend (S3) ☐ : monitor/1_dvi_PCI_ID_0x46d1 after suspend (S3) ☐ : monitor/1_hdmi_PCI_ID_0x46d1 after suspend (S3) ☐ : audio/1_playback_hdmi_PCI_ID_0x46d1 after suspend (S3) ☐ : Display connected via Thunderbolt 3 for Intel Corporation PCI ID 0x46d1 after suspend (S3) ☐ : audio/1_playback_thunderbolt3_PCI_ID_0x46d1 after suspend (S3) ☐ : monitor/1_vga_PCI_ID_0x46d1 after suspend (S3) ☐ : monitor/1_multi-head_PCI_ID_0x46d1 after suspend (S3) ☐ : suspend/oops_after_suspend ☒ : suspend/oops_results_after_suspend.log ☐ : suspend/speaker-headphone-plug-detection-after-suspend ☐ : suspend/microphone-plug-detection-after-suspend ☐ : suspend/playback_headphones-after-suspend ☐ : suspend/alsa_record_playback_external-after-suspend ☐ : bluetooth/audio-a2dp after suspend (S3) ☐ : bluetooth4/HOGP-mouse after suspend (S3) ☐ : bluetooth4/HOGP-keyboard after suspend (S3) ☐ : bluetooth/audio_record_playback after suspend (S3) ☑ : suspend/pointing-after-suspend_USB_Optical_Mouse_MOUSE_1 ☑ : Check post suspend button functionality for USB Optical Mouse ☐ : suspend/sdhc-insert-after-suspend ☐ : suspend/sdhc-storage-after-suspend ☐ : suspend/sdhc-remove-after-suspend ☐ : Displays discovered optical drives ☐ : touchpad/basic-after-suspend ☐ : touchpad/detected-as-mouse ☐ : touchpad/detected-as-mouse-after-suspend ☐ : touchpad/palm-rejection-after-suspend ☐ : touchpad/continuous-move-after-suspend ☐ : touchpad/singletouch-selection-after-suspend ☐ : touchpad/drag-and-drop-after-suspend ☐ : touchpad/multitouch-rightclick-after-suspend ☐ : touchpad/multitouch-after-suspend ☐ : touchscreen/drag-n-drop after suspend (S3) ☐ : Check touchscreen pinch gesture for zoom after suspend (S3) ☐ : Check touchscreen pinch gesture for rotate after suspend (S3) ☑ : suspend/usb_insert_after_suspend ☑ : suspend/usb_storage_automated_after_suspend ☑ : suspend/usb_remove_after_suspend ☑ : suspend/usb3_insert_after_suspend ☑ : suspend/usb3_storage_automated_after_suspend ☑ : suspend/usb3_remove_after_suspend ☑ : USB HID work on USB Type-C port using a "USB Type-C to Type-A" adapter after suspend (S3) ☑ : usb-c/c-to-a-adapter/insert after suspend (S3) ☑ : usb-c/c-to-a-adapter/storage-automated after suspend (S3) ☑ : usb-c/c-to-a-adapter/remove after suspend (S3) ☐ : USB HID work on USB Type-C port after suspend (S3) ☑ : USB 3.0 storage device insertion detected on USB Type-C port after suspend (S3) ☑ : USB 3.0 storage device read & write works on USB Type-C port after suspend (S3) ☑ : USB 3.0 storage device removal detected on USB Type-C port after suspend (S3) ☑ : Check if USB Type-C to Ethernet adapter works after suspend (S3) ERROR:plainbox.bug:Undeclared exception JSONDecodeError raised from export_to_transport ERROR:checkbox-ng.launcher.stages:Problem with a '2_html_file' report using 'com.canonical.plainbox::html' exporter sent to '/home/u/.local/share/checkbox-ng/submission_2023-01-09T08.24.50.448559.html' transport. file:///home/u/.local/share/checkbox-ng/submission_2023-01-09T08.24.50.448559.junit.xml ERROR:plainbox.bug:Undeclared exception JSONDecodeError raised from export_to_transport ERROR:checkbox-ng.launcher.stages:Problem with a '2_tar_file' report using 'com.canonical.plainbox::tar' exporter sent to '/home/u/.local/share/checkbox-ng/submission_2023-01-09T08.24.50.448559.tar.xz' transport.

pieqq commented 1 year ago

Investigation

Doug reported the following sessions:

checkbox-sessions.zip

The session that exhibits problems is session_title-2023-01-09T06.53.11.session.

After a lot of debugging and comparisons, I found the problem.

Checkbox relies on a few jobs to generate submissions (HTML reports, tar.xz archives, etc.). Such jobs include system_info_json, raw_devices_dmi_json, modprobe_json, etc. They are attachment jobs that outputs information into a file stored in the session's io-logs directory.

In this session, those files are present (e.g ./session_title-2023-01-09T06.53.11.session/io-logs/com.canonical.certification__raw_devices_dmi_json.*), but most of them are empty!

/var/tmp/checkbox-ng/sessions/session_title-2023-01-09T06.53.11.session/io-logs$ ll *json*
-rw-rw-r-- 1 u u   314  一   9 14:53 com.canonical.certification__dkms_info_json.record.gz
-rw-rw-r-- 1 u u   182  一   9 14:53 com.canonical.certification__dkms_info_json.stderr
-rw-rw-r-- 1 u u     3  一   9 14:53 com.canonical.certification__dkms_info_json.stdout
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__lspci_standard_config_json.record.gz
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__lspci_standard_config_json.stderr
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__lspci_standard_config_json.stdout
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__modprobe_json.record.gz
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__modprobe_json.stderr
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__modprobe_json.stdout
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__raw_devices_dmi_json.record.gz
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__raw_devices_dmi_json.stderr
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__raw_devices_dmi_json.stdout
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__system_info_json.record.gz
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__system_info_json.stderr
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__system_info_json.stdout
-rw-rw-r-- 1 u u  4584  一   9 14:53 com.canonical.certification__udev_json.record.gz
-rw-rw-r-- 1 u u     0  一   9 14:53 com.canonical.certification__udev_json.stderr
-rw-rw-r-- 1 u u 28988  一   9 14:53 com.canonical.certification__udev_json.stdout

This is really weird, because it means the jobs have been run by Checkbox, but something happened at that time that led to nothing being recorded in the session.

When preparing the submission(s), Checkbox uses templates to generate an HTML report, a json file, etc.

Because those attachment jobs have been run, the templates try to retrieve data supposedely collected by them, only to fail because they are not valid JSON (in the case of the checkbox.html HTML template, see "Additional info" below for more information), or because they output an empty string that creates an invalid JSON file (in the case of the checkbox.json JSON template).

Side note: reset the session status

Whenever Checkbox completes a test run, it generates the submissions and mark the session as submitted. Once in this state, it is ignored by subsequent runs of Checkbox. Therefore, user cannot retry to generate the submissions anymore.

Here are the steps to work around this:

  1. Go to the session you want to retry: cd /var/tmp/checkbox-ng/sessions/<session_to_retry>
  2. Edit the gzipped session file to change the submitted status to incomplete: cp session session.gz && gunzip session.gz && sed -i 's/\[\"submitted\"\]/\[\"incomplete\"\]/g' session && gzip session && mv session.gz session
  3. Rerun Checkbox. It should ask you what you want to do with this session. Select "resume", then finish the test run to try to generate the submission files again.

Additional info

When checkbox-cli finalizes the session, it tries to generate different reports, including an HTML report. This is generated using a Jinja2 template. One of the parts of the HTML report is a "System Information" section that includes data from the system_info_json job that should be present in every session. This job is an attachment job that provides information about the device under test like so:

{
    "System": [
        "Kernel: 5.15.0-1021-intel-iotg x86_64",
        "Distro: Ubuntu 22.04.1 LTS (Jammy Jellyfish)"
    ],
    "Machine": [
        "Type: Laptop System: Intel ",
        "product: Alder Lake Client Platform v: 0.1 ",
        "UEFI: Intel v: ADLNFWI1.R00.3307.A00.2207250823 date: 07/25/2022 "
    ],
    "Memory": [
        "7.54 GiB"
    ],
    "Audio": [
        "Intel",
        "Intel"
    ],
    "Graphics": [
        "Intel"
    ],
    "Network": [
        "Intel",
        "Intel Ethernet I225-IT"
    ],
    "Drives": [
        "DA6128 116.48 GiB "
    ]
}

For this session, though, the attachment job contains nothing. When the exporter tries to process this as JSON, it fails. json.loads() requires at least an empty pair of curly brackets to work:

>>> import json
>>> from collections import OrderedDict
>>> json.loads("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
>>> json.loads("{}")
{}
>>> json.loads("{}", object_pairs_hook=OrderedDict)
OrderedDict()

Solution

The problem is that Checkbox does not export the files as expected by the user.

The root cause is still unknown. Checkbox seems to run fine of this device, since other sessions do include the json attachment job results, and they are not empty.

We can reinforce the checks in the HTML and JSON templates to avoid generating invalid documents.

For instance, in checkbox.json,

{%- if ns ~ 'raw_devices_dmi_json' in state.job_state_map and state.job_state_map[ns ~ 'raw_devices_dmi_json'].result.outcome == 'pass' %},
{%- set raw_devices_dmi_json = state.job_state_map[ns ~ 'raw_devices_dmi_json'].result.io_log_as_text_attachment %}
    "raw-devices-dmi": {{ raw_devices_dmi_json | indent(4, false) | safe }}
{%- endif %}

could become

{%- if ns ~ 'raw_devices_dmi_json' in state.job_state_map and state.job_state_map[ns ~ 'raw_devices_dmi_json'].result.outcome == 'pass' and state.job_state_map[ns ~ 'raw_devices_dmi_json'].result.io_log_as_text_attachment %},
{%- set raw_devices_dmi_json = state.job_state_map[ns ~ 'raw_devices_dmi_json'].result.io_log_as_text_attachment %}
    "raw-devices-dmi": {{ raw_devices_dmi_json | indent(4, false) | safe }}
{%- endif %}

However, it means we will generate documents that lack potentially important info (system info, raw device DMI, etc.). I'm not sure what are the impacts down the line (for instance, when the submission archive is uploaded to C3).

@yphus what is your take on this?

djacobs98 commented 1 year ago

@pieqq - I do not remember there being any unusual errors or issues while running the tests. I (F)inish the test run and it starts to generate the result files.

It is not consistent either. When I reran the same test set, the first 3 runs failed but the 4th run managed to properly generate the tarball. I would rather have a consistent failure - easier to debug.

yphus commented 1 year ago

Fixing this bug properly implies improving the reliability of the transport steps. Let's have a story in Jira and link this bug to it.