Memory stress test is crashing without any debug logs for the issue.

cvelloth commented 3 months ago

Bug Description

In Desktop Preload Certification Test Suite for 24.04., the stress test memory/memory_stress_ng is failing with the message crashed and terminated. The tar files in the logs, or the html log files do not give any debug data, to understand the cause of failure. How can we debug the cause for the failure ?

In Desktop Preload Certification Test Suite

To Reproduce

To Reproduce Install Checkbox via the following commmands

$ sudo snap install checkbox22
$ sudo snap install checkbox --classic

Run "memory/memory_stress_ng" job via the following steps

$ checkbox.checkbox-cli
# Choose "Desktop Preload Certification Tests for 24.04 " test plan
# Select "Stress test of system memory" case in "Stress Tests" set
# Press "T" to perform testing

Environment

OS: 24.04 Ubuntu Desktop
Checkbox Type: Snap
Checkbox version :: 4.1.0.dev25

Relevant log output

stress/memory_stress_ng crashed blocker     terminated

Additional context

No response

syncronize-issues-to-jira[bot] commented 3 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1522.

This message was autogenerated

cvelloth commented 3 months ago

submission.json

pieqq commented 2 months ago

Hello @cvelloth ! Sorry for the late reply, I missed the issues you opened.

The submission.json you attache shows this:

        {
            "id": "stress/memory_stress_ng",
            "full_id": "com.canonical.certification::stress/memory_stress_ng",
            "name": "Stress test of system memory",
            "certification_status": "blocker",
            "category": "Stress tests",
            "category_id": "com.canonical.plainbox::stress",
            "status": "fail",
            "outcome": "crash",
            "comments": "terminated",
            "io_log": "",
            "type": "test",
            "project": "certification",
            "duration": 4574.1000854969025,
            "plugin": "shell",
            "template_id": null
        }

which, as you said, means the test case crashed, and as a result no I/O was logged. (but the test ran for around 1h20, as we see with the duration key).

In order to keep investigating this issue, here are a few things you can do:

Use Checkbox remote instead of Checkbox in local mode: By installing Checkbox on both the device you want to test and your own computer, you can control the former using the latter. Please have a look at our tutorial page about remote testing, as well as the explanation page about Checkbox remote. In a nutshell, once Checkbox is installed on both devices, you can run checkbox.checkbox-cli control <IP of the DUT>. You will be able to select the test plan and run it from you own machine, but it will be executed on the DUT. This way, you can see what's going on with testing even when the DUT is suspended, and if the test case crashes, you will at least see the last I/O sent from the DUT
If you haven't re-installed the DUT, you can find the raw logs that Checkbox uses to generate the report in /var/tmp/checkbox-ng/sessions. Please create an archive with the session, and attach it to this issue.
If you've already wiped the system, please re-run the stress tests using Checkbox remote, and if the problem happens again, go on the DUT's /var/tmp/checkbox-ng/sessions/ and create an archive with the content of this directory, then attach it here so we can have a look.

Thanks and sorry again for the late reply!

cvelloth commented 2 months ago

Hi Pierre,

Thanks for your input. With remote testing, this test is passing without issue. Will close this issue.

Thanks and Regards, Chandana

From: Pierre Equoy @.> Sent: Tuesday, August 27, 2024 1:04 PM To: canonical/checkbox @.> Cc: Velloth, Chandana @.>; Mention @.> Subject: Re: [canonical/checkbox] Memory stress test is crashing without any debug logs for the issue. (Issue #1391)

Hello @cvellothhttps://github.com/cvelloth ! Sorry for the late reply, I missed the issues you opened.

The submission.json you attache shows this:

    {

        "id": "stress/memory_stress_ng",

        "full_id": "com.canonical.certification::stress/memory_stress_ng",

        "name": "Stress test of system memory",

        "certification_status": "blocker",

        "category": "Stress tests",

        "category_id": "com.canonical.plainbox::stress",

        "status": "fail",

        "outcome": "crash",

        "comments": "terminated",

        "io_log": "",

        "type": "test",

        "project": "certification",

        "duration": 4574.1000854969025,

        "plugin": "shell",

        "template_id": null

    }

which, as you said, means the test case crashed, and as a result no I/O was logged. (but the test ran for around 1h20, as we see with the duration key).

In order to keep investigating this issue, here are a few things you can do:

Use Checkbox remote instead of Checkbox in local mode: By installing Checkbox on both the device you want to test and your own computer, you can control the former using the latter. Please have a look at our tutorial page about remote testinghttps://canonical-checkbox.readthedocs-hosted.com/en/stable/tutorial/using-checkbox/remote.html, as well as the explanation page about Checkbox remotehttps://canonical-checkbox.readthedocs-hosted.com/en/stable/explanation/remote.html. In a nutshell, once Checkbox is installed on both devices, you can run checkbox.checkbox-cli control . You will be able to select the test plan and run it from you own machine, but it will be executed on the DUT. This way, you can see what's going on with testing even when the DUT is suspended, and if the test case crashes, you will at least see the last I/O sent from the DUT
If you haven't re-installed the DUT, you can find the raw logs that Checkbox uses to generate the report in /var/tmp/checkbox-ng/sessions. Please create an archive with the session, and attach it to this issue.
If you've already wiped the system, please re-run the stress tests using Checkbox remote, and if the problem happens again, go on the DUT's /var/tmp/checkbox-ng/sessions/ and create an archive with the content of this directory, then attach it here so we can have a look.

Thanks and sorry again for the late reply!

— Reply to this email directly, view it on GitHubhttps://github.com/canonical/checkbox/issues/1391#issuecomment-2311777801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT5SBKRGVRFGTKWV4LX4ROTZTQTYDAVCNFSM6AAAAABMB6G7Q2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRG43TOOBQGE. You are receiving this because you were mentioned.Message ID: @.**@.>>

cvelloth commented 2 months ago

The test is passing when testing with remote as explained above. Closing this issue.

cvelloth commented 4 weeks ago

Seeing this issue again.

Sahanaaks commented 3 weeks ago

I think this test case has some new changes, previously it was passing with the checkbox remote but now it throws an error that "Job rebooted the machine or the Checkbox agent. Resuming the session and marking it as crashed." - which in reality doesnt happen.

canonical / checkbox