canonical / checkbox

Checkbox is a testing framework used to validate device compatibility with Ubuntu Linux. It’s the testing tool developed for the purposes of the Ubuntu Certification program.
https://checkbox.readthedocs.io
GNU General Public License v3.0
33 stars 49 forks source link

Checkbox-ng remote takes WAY too long to run on Jammy #54

Open beliaev-maksim opened 1 year ago

beliaev-maksim commented 1 year ago

This issue was migrated from https://bugs.launchpad.net/checkbox-ng/+bug/1980288

Summary

Status Created on Heat Importance Security related
Triaged 2022-06-29 22:07:54 6 Critical False

Description

I've spent weeks now doing repeated testing to get the data needed to highlight this.

When running SRU tests against Jammy, it takes upwards of 2 days per run, whereas for any other Ubuntu release the test takes only a few hours.

This is not strictly an OS issue, as running the SRU tests locally using Jammy takes significantly less time than running it via checkbox remote (as done with testflinger). Because of this, I'm practically unable to do SRU testing on Jammy, as each SRU cycle potentially means testing 5 different kernels on each machine.

I have a spreadsheet with collated timing data for every current LTS kernel using the same SRU tests. I can share that if necessary, but the tl;dr of that spreadsheet is:

        Avg Time

Bionic 11:23:49 Bionic HWE 9:45:47 Focal 9:37:33 Focal HWE 10:46:38 Jammy TF 48:17:45 Jammy Local 17:23:34

As you can see, Bionic and Focal run to completion in less than 12 hours on average.

Jammy TF is Jammy run via Testflinger using checkbox-remote. in EVERY instance, it hits the global timeout because of something going on that is specific to checkbox-remote and Jammy.

Jammy Local is instances where I did a deployment of jammy, then logged into the machines directly and ran the same SRU suite by hand in a screen session. That average DOES take longer on jammy, so a component of this is related to Jammy and I'll start tracking this down later... but for now, the fact that checkbox-remote exacerbates this to the point where testing is never completed is a huge problem.

I logged times across 27 machines to get this data. Some systems I didn't get times for due to other issues with the machines and MAAS during deployment, but those are all deployment issues, not issues once deployed and running tests.

Attachments

doubletusk-jammy-egx-2022-08-19_1209.log

Tags: []

beliaev-maksim commented 1 year ago

This thread was migrated from launchpad.net

https://launchpad.net/~pieq wrote on 2022-07-22 15:24:39:

Sorry for leaving this bug unaddressed!

Could you provide (or point to) some submissions made with Jammy using remote? I would like to check the duration of the jobs to see if this is something to do with some specific jobs, or if it's something related to the network, etc.

Archives of ongoing Checkbox sessions (the stuff in /var/tmp/checkbox-ng/sessions/) should help as well, since they contain the same information about duration of jobs and current job being run.

https://launchpad.net/~bladernr wrote on 2022-08-23 14:39:27:

"Could you provide (or point to) some submissions made with Jammy using remote?" So far no, because none have completed, tehy get to the 48 hour timeout and end, the remote session disappears, the containers are reaped and that's it.So I never actually have a chance to get any logs. I can try to manually do one and see if I can capture logs that way.

You can at least see on my spreadsheet where each test run hung... it's always a long-term test like a stress test. https://docs.google.com/spreadsheets/d/1I26l91opzc0QuZWGlCbE4ZT63qMXv08k-eOywTsGOBI/

I'll see if I can get something useful otherwise... but really this should be pretty easy to replicate... just use remote to run the cert suite

one thing you can see right now though is that this ONLY affects running checkbox-remote against a Jammy host. This did not happen running against Focal w/ the Jammy kernel (HWE): https://certification.canonical.com/hardware/202205-30291/submission/277513/

https://launchpad.net/~bladernr wrote on 2022-08-23 14:47:59:

Another thing you can see is this. A shorter testplan against Jammy:

https://certification.canonical.com/hardware/202101-28672/submission/277443

Looking at the submission, checkbox says the total test time was 4 hours 21 minutes, but actual execution time (the amount of time it took from the time testflinger launched until the test completed was 10 hours and 35 minutes:



Cleanup complete 10:34:59

Attached is a log created when I run testflinger (passing the console dump through tee)

https://launchpad.net/~bladernr wrote on 2022-09-27 15:50:34:

Also should note this is only on the full SRU run where the stress tests happens. I also have tested Jammy with the Nvidia EGX provider I use for that effort and testing works fine.

And as noted, this works ok with Focal HWE (5.15 on Focal) so I don't think it's a kernel thing but rather something maybe changed in a library on Jammy.

bladernr commented 1 year ago

FYI this should be very, very easy to reproduce.

Testing locally, install canonical-certification-server, launch a screen session and run: /usr/bin/time -f %E test-regression

Then test using checkbox-remote locally over localhost: /usr/bin/time -f %E checkbox-cli remote 127.0.0.1 /usr/bin/test-regression

When running Focal the times should be pretty similar.

When running Jammy, the "remote over localhost" run should be orders of magnitude longer than the "run directly without remote"

I have several machines STILL running this test using the "remote over localhost" scenario that have been going for nearly three full days. I am going to restart them and see if I can run a much smaller test suite just to see if I can see a faster way to get accurate times...

pieqq commented 1 year ago

Did you manage to do a full run using checkbox remote and push the results to C3? The best would be, using the same device:

both uploaded to C3.

If you could also grab their respective journal logs, and any additional logs (like the testflinger you mentioned), that would be great.

So far, I have 2 guesses:

  1. something is going off for a few jobs; in that case, we should see a clear difference in these jobs execution time between the two submission (local and remote)
  2. something is wrong somewhere else; in that case, there would be no major difference in the total execution time between the two runs (because the execution time is the aggregate of each job's command execution time).
bladernr commented 1 year ago

Still working on it. I tried using this: checkbox-cli remote 127.0.0.1 test-regression and that ran for a week before I killed it. (The same test ran for between 8-12 hours tops when run locally).

I thought I could shorten it a bit by just running the memory stress test so I created the attached launcher to just run the memory stress test and that's where I am now. The launcher does do automatic submission at the end and I run it like this:

time -f %E ./timing-test-launcher #for local runs
time -f %E checkbox-cli remote 127.0.0.1 ./timing-test-launcher #for remote runs

I noticed that there seems to be NO difference between running against localhost vs a network card, and since this is all be ing run local to the machine, that rules out any legit network funk between the SUT and the testflinger agent that normally runs checkbox remote tests.

Looking at the logs (syslog and journal) I noticed that checkbox-ng.service is being killed and restarted a few times by OOMKiller. That said, I suspect that's because some test cases deliberately trigger OOMKiller, and when it happens checkbox-ng.service just restarts and keeps doing its thing.

That said, I killed the test on one machine and modified the checkbox service using OOMScoreAdjust=-1000 which is supposed to keep it from being reaped (at least google says that), and I restarted the test just to see what happens. The other machines are still running though (and have been running since last week. These I'm going to let run to completion to see what they actually do.

But I suspect the reaping of checkbox-ng.service is a red herring here, I just include that for completeness sake.

I have two other machines that are now running the remote test but using Focal+HWE to see if this is a kernel thing, or a Jammy+ userspace thing.

So in summary: Focal+HWE Local: runs as expected Focal+HWE Remote: still to do Jammy GA Local: runs as expected, completes memory stress in 3-4 hours Jammy GA Remote: in progress - already ran for several days on a test case that should have finished in a few hours.

Using the attached launcher, this is triggerable 100% of the time for me on every system I've tried, and all this points to your first guess. Note, of course github insists that every file have one of their limited number of extensions, even for files that don't need or want an extension, so remove the .txt before using... timing-test-launcher.txt

pieqq commented 1 year ago

Have you ever seen a Jammy session complete at all when using Checkbox remote?

When you say

That said, I suspect that's because some test cases deliberately trigger OOMKiller, and when it happens checkbox-ng.service just restarts and keeps doing its thing.

do you actually have a view on what Checkbox is doing? (I mean, are you monitoring the run using a remote device?) If so, what is it showing?

I'm starting to wonder if the red herring you're mentioning could be the actual culprit.

For months, QA team had troubles when running memory stress tests on Jammy, because of how OOMkiller is being set. One of the QA engineer recently proposed a PR (#297) that was merged in the last release. This is the outcome of a long discussion about this problem.

Have you tried running everything but the memory stress tests?

bladernr commented 1 year ago

Honestly I'm not sure if checkbox service being oomkilled is the issue (it could well be), only because very unscientificallly the instances in the journal/logs where I can see the service being killed and restarted seem to correspond with specific test cases that expect to have OOM Killer triggered, and so far, I would expect to see those messages far more than just 4 or 5 times. That said, I also have tried by using OOMScoreAdjust but it's still being reaped, so that isn't a fix. Maybe on that one machine next I'll have to try what the QA team suggested (I was loosely following that thread after commenting on it a bit)

FWIW, I'm at 16 restarts now on this one machine where I'm watching the logs:


Mar  7 17:30:21 barbos kernel: [76196.638862] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=checkbox-ng.service,mems_allowed=0,global_oom,task_memcg=/system.slice/checkbox-ng.service,task=stress-ng-mlock,pid=44164,uid=0
Mar  7 17:30:22 barbos kernel: [76196.896971] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=checkbox-ng.service,mems_allowed=0,global_oom,task_memcg=/system.slice/checkbox-ng.service,task=stress-ng-mlock,pid=43937,uid=0
Mar  7 17:30:23 barbos kernel: [76197.849642] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=checkbox-ng.service,mems_allowed=0,global_oom,task_memcg=/system.slice/checkbox-ng.service,task=stress-ng-mlock,pid=44060,uid=0
Mar  7 17:30:23 barbos systemd[1]: checkbox-ng.service: A process of this unit has been killed by the OOM killer.
Mar  7 17:30:44 barbos systemd[1]: checkbox-ng.service: Failed with result 'oom-kill'.
Mar  7 17:30:44 barbos systemd[1]: checkbox-ng.service: Consumed 4d 10h 23min 14.295s CPU time.
Mar  7 17:30:44 barbos systemd[1]: checkbox-ng.service: Scheduled restart job, restart counter is at 16.
Mar  7 17:30:44 barbos systemd[1]: checkbox-ng.service: Consumed 4d 10h 23min 14.295s CPU time.
Mar  7 17:30:46 barbos checkbox-ng.service[44171]: normal_user not supplied via config(s).
Mar  7 17:30:46 barbos checkbox-ng.service[44171]: Using `ubuntu` user

I had also thought to maybe try creating a testplan that runs everything BUT the stress tests to see what happens too.

As for monitoring, the one I'm getting logs on I'm ssh'd in and running  the test in a screen session, and in a different terminal I'm also ssh'ed in and just doing a `tail -f /var/log/syslog |grep checkbox-ng.service` to see where the restarts are happening.
bladernr commented 1 year ago

OK, so I had Focal HWE (5.15) running on a machine and finally just killed the test run. The total time to that point was 170 hours. The local run of hte same test case was only 2 hours. so this is related to the kernel and remote, unfortunately, the fix for the OOMD config doesn't work here anyway as the files they're looking for don't exist on Focal. (not on my test machine at least). I can try it sometime on a Jammy machine to see what happens though, but as we still have to test focal, it's not a viable solution even if it does help on Jammy systems.

So now, to summarise, this bug prevents me from using checkbox remote on anything with 5.15.

pieqq commented 1 year ago

I've ran the following test:

From the remote controller, I can see the usual session running, launching resource jobs, etc.

Then, it starts the memory stress-ng job. Each stressor goes, one by one. It takes a while. After around 3+ hours, the bigheap stressor triggers an oomd-killer which kills the checkbox service on the DUT. As I was expecting, the session is lost and I am prompted with a new "Select Test Plan" screen. This is why QA team did a fix.

I'm now going to try on a 20.04 machine.

Logs

As seen from the remote controller ``` ------------------------[ Stress test of system memory ]------------------------ ID: com.canonical.certification::memory/memory_stress_ng Category: Memory tests -------------------------------------------------------------------------------- stress-ng: info: [46693] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [46693] dispatching hogs: 4 bsearch stress-ng: info: [46693] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [47744] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [47744] dispatching hogs: 4 context stress-ng: info: [47744] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [48801] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [48801] dispatching hogs: 4 hsearch stress-ng: info: [48801] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [49858] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [49858] dispatching hogs: 4 lsearch stress-ng: info: [49858] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [50917] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [50917] dispatching hogs: 4 matrix stress-ng: info: [50917] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [51969] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [51969] dispatching hogs: 4 memcpy stress-ng: info: [51969] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [53024] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [53024] dispatching hogs: 4 null stress-ng: info: [53024] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [54079] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [54079] dispatching hogs: 4 pipe stress-ng: info: [54079] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [55135] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [55135] dispatching hogs: 4 qsort stress-ng: info: [55135] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [56185] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [56185] dispatching hogs: 4 str stress-ng: info: [56185] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [57241] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [57241] dispatching hogs: 4 stream stress-ng: info: [57243] stream: stressor loosely based on a variant of the STREAM benchmark code stress-ng: info: [57243] stream: do NOT submit any of these results to the STREAM benchmark results stress-ng: info: [57243] stream: Using CPU cache size of 4096K stress-ng: info: [57246] stream: memory rate: 2633.95 MB read/sec, 1755.97 MB write/sec, 230.16 Mflop/sec (instance 3) stress-ng: info: [57245] stream: memory rate: 2927.63 MB read/sec, 1951.76 MB write/sec, 255.82 Mflop/sec (instance 2) stress-ng: info: [57243] stream: memory rate: 3498.10 MB read/sec, 2332.07 MB write/sec, 305.67 Mflop/sec (instance 0) stress-ng: info: [57244] stream: memory rate: 3204.82 MB read/sec, 2136.55 MB write/sec, 280.04 Mflop/sec (instance 1) stress-ng: info: [57241] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [58291] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [58291] dispatching hogs: 4 tsearch stress-ng: info: [58291] successful run completed in 300.07s (5 mins, 0.07 secs) stress-ng: info: [59347] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [59347] dispatching hogs: 4 vm-rw stress-ng: info: [59347] successful run completed in 300.10s (5 mins, 0.10 secs) stress-ng: info: [60401] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [60401] dispatching hogs: 4 wcs stress-ng: info: [60401] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [61455] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [61455] dispatching hogs: 4 zero stress-ng: info: [61455] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [62510] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [62510] dispatching hogs: 4 mlock stress-ng: info: [62510] successful run completed in 300.76s (5 mins, 0.76 secs) stress-ng: info: [63570] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [63570] dispatching hogs: 4 mmapfork stress-ng: info: [63570] successful run completed in 301.04s (5 mins, 1.04 secs) stress-ng: info: [106806] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [106806] dispatching hogs: 4 mmapmany stress-ng: info: [106806] successful run completed in 300.44s (5 mins, 0.44 secs) stress-ng: info: [107862] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [107862] dispatching hogs: 4 mremap stress-ng: warn: [107862] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: info: [107862] successful run completed in 300.03s (5 mins, 0.03 secs) stress-ng: info: [112534] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [112534] dispatching hogs: 4 shm-sysv stress-ng: info: [112534] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [217433] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [217433] dispatching hogs: 4 vm-splice stress-ng: info: [217433] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [218483] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [218483] dispatching hogs: 4 malloc stress-ng: info: [218483] successful run completed in 453.06s (7 mins, 33.06 secs) stress-ng: info: [220171] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [220171] dispatching hogs: 4 mincore stress-ng: info: [220171] successful run completed in 453.00s (7 mins, 33.00 secs) stress-ng: info: [221755] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [221755] dispatching hogs: 4 vm stress-ng: info: [221755] successful run completed in 453.01s (7 mins, 33.01 secs) stress-ng: info: [223422] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [223422] dispatching hogs: 4 mmap stress-ng: info: [223422] successful run completed in 453.09s (7 mins, 33.09 secs) stress-ng: info: [225020] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [225020] dispatching hogs: 8 stack stress-ng: info: [225020] successful run completed in 453.03s (7 mins, 33.03 secs) stress-ng: info: [226624] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [226624] dispatching hogs: 8 bigheap Connection lost! [Errno 104] Connection reset by peer Reconnecting... ```
`journalctl` from the DUT when the OOM is triggered ``` Mar 16 14:18:35 Inspiron-7370 stress-ng[223422]: invoked with 'stress-ng --aggressive --verify --timeout 453.7224578857422 --mmap 0' by user 0 Mar 16 14:18:35 Inspiron-7370 stress-ng[223422]: system: 'Inspiron-7370' Linux 5.19.0-35-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 3 18:36:56 UTC 2023 x86_64 Mar 16 14:18:35 Inspiron-7370 stress-ng[223422]: memory (MB): total 15741.18, free 3618.57, shared 484.82, buffer 259.41, swap 2048.00, free swap 1862.25 Mar 16 14:18:46 Inspiron-7370 systemd[1]: fprintd.service: Deactivated successfully. Mar 16 14:23:33 Inspiron-7370 update-notifier[2402]: gtk_widget_get_scale_factor: assertion 'GTK_IS_WIDGET (widget)' failed Mar 16 14:26:08 Inspiron-7370 stress-ng[225020]: invoked with 'stress-ng --aggressive --verify --timeout 453.7224578857422 --stack 8' by user 0 Mar 16 14:26:08 Inspiron-7370 stress-ng[225020]: system: 'Inspiron-7370' Linux 5.19.0-35-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 3 18:36:56 UTC 2023 x86_64 Mar 16 14:26:08 Inspiron-7370 stress-ng[225020]: memory (MB): total 15741.18, free 2203.07, shared 474.40, buffer 259.53, swap 2048.00, free swap 464.93 Mar 16 14:30:01 Inspiron-7370 CRON[225857]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 16 14:30:01 Inspiron-7370 CRON[225858]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi) Mar 16 14:30:01 Inspiron-7370 CRON[225857]: pam_unix(cron:session): session closed for user root Mar 16 14:31:10 Inspiron-7370 systemd[1]: Started Run anacron jobs. Mar 16 14:31:10 Inspiron-7370 anacron[226098]: Anacron 2.3 started on 2023-03-16 Mar 16 14:31:10 Inspiron-7370 anacron[226098]: Normal exit (0 jobs run) Mar 16 14:31:10 Inspiron-7370 systemd[1]: anacron.service: Deactivated successfully. Mar 16 14:33:41 Inspiron-7370 stress-ng[226624]: invoked with 'stress-ng --aggressive --verify --timeout 453.7224578857422 --bigheap 8' by user 0 Mar 16 14:33:41 Inspiron-7370 stress-ng[226624]: system: 'Inspiron-7370' Linux 5.19.0-35-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 3 18:36:56 UTC 2023 x86_64 Mar 16 14:33:41 Inspiron-7370 stress-ng[226624]: memory (MB): total 15741.18, free 2245.50, shared 474.41, buffer 259.66, swap 2048.00, free swap 464.93 Mar 16 14:34:01 Inspiron-7370 kernel: power-profiles- invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 Mar 16 14:34:01 Inspiron-7370 kernel: CPU: 2 PID: 618 Comm: power-profiles- Not tainted 5.19.0-35-generic #36-Ubuntu Mar 16 14:34:01 Inspiron-7370 kernel: Hardware name: Dell Inc. Inspiron 7370/, BIOS 1.25.0 07/13/2022 Mar 16 14:34:01 Inspiron-7370 kernel: Call Trace: Mar 16 14:34:01 Inspiron-7370 kernel: Mar 16 14:34:01 Inspiron-7370 kernel: show_stack+0x4e/0x61 Mar 16 14:34:01 Inspiron-7370 kernel: dump_stack_lvl+0x4a/0x6f Mar 16 14:34:01 Inspiron-7370 kernel: dump_stack+0x10/0x18 Mar 16 14:34:01 Inspiron-7370 kernel: dump_header+0x53/0x246 Mar 16 14:34:01 Inspiron-7370 kernel: oom_kill_process.cold+0xb/0x10 Mar 16 14:34:01 Inspiron-7370 kernel: out_of_memory+0x101/0x2f0 Mar 16 14:34:01 Inspiron-7370 kernel: __alloc_pages_may_oom+0x112/0x1e0 Mar 16 14:34:01 Inspiron-7370 kernel: __alloc_pages_slowpath.constprop.0+0x4cf/0xa30 Mar 16 14:34:01 Inspiron-7370 kernel: __alloc_pages+0x31d/0x350 Mar 16 14:34:01 Inspiron-7370 kernel: alloc_pages+0x90/0x1c0 Mar 16 14:34:01 Inspiron-7370 kernel: folio_alloc+0x1d/0x60 Mar 16 14:34:01 Inspiron-7370 kernel: filemap_alloc_folio+0x8e/0xb0 Mar 16 14:34:01 Inspiron-7370 kernel: __filemap_get_folio+0x1c7/0x3c0 Mar 16 14:34:01 Inspiron-7370 kernel: filemap_fault+0x144/0x910 Mar 16 14:34:01 Inspiron-7370 kernel: __do_fault+0x39/0x120 Mar 16 14:34:01 Inspiron-7370 kernel: do_read_fault+0xf5/0x170 Mar 16 14:34:01 Inspiron-7370 kernel: do_fault+0xa6/0x300 Mar 16 14:34:01 Inspiron-7370 kernel: handle_pte_fault+0x117/0x240 Mar 16 14:34:01 Inspiron-7370 kernel: __handle_mm_fault+0x693/0x740 Mar 16 14:34:01 Inspiron-7370 kernel: handle_mm_fault+0xba/0x2a0 Mar 16 14:34:01 Inspiron-7370 kernel: do_user_addr_fault+0x1c1/0x680 Mar 16 14:34:01 Inspiron-7370 kernel: exc_page_fault+0x80/0x1b0 Mar 16 14:34:01 Inspiron-7370 kernel: asm_exc_page_fault+0x27/0x30 Mar 16 14:34:01 Inspiron-7370 kernel: RIP: 0033:0x7fb3b1146560 Mar 16 14:34:01 Inspiron-7370 kernel: Code: Unable to access opcode bytes at RIP 0x7fb3b1146536. Mar 16 14:34:01 Inspiron-7370 kernel: RSP: 002b:00007fffbe33ade8 EFLAGS: 00010246 Mar 16 14:34:01 Inspiron-7370 kernel: RAX: 0000000000000007 RBX: 00007fb3a0009a70 RCX: 0000000000000000 Mar 16 14:34:01 Inspiron-7370 kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 00007fb3a0009a88 Mar 16 14:34:01 Inspiron-7370 kernel: RBP: 000055bc25cc7b60 R08: 0000000000000006 R09: 0000000000000008 Mar 16 14:34:01 Inspiron-7370 kernel: R10: 0000000000000021 R11: 0000000000000020 R12: 000055bc25cc7b00 Mar 16 14:34:01 Inspiron-7370 kernel: R13: 000055bc25c9a690 R14: 00007fb3b1216210 R15: 0000000000000002 Mar 16 14:34:01 Inspiron-7370 kernel: Mar 16 14:34:01 Inspiron-7370 kernel: Mem-Info: Mar 16 14:34:01 Inspiron-7370 kernel: active_anon:1098994 inactive_anon:2769212 isolated_anon:94 active_file:70 inactive_file:177 isolated_file:100 unevictable:104 dirty:0 writeback:0 slab_reclaimable:55588 slab_unreclaimable:38062 mapped:2653 shmem:28533 pagetables:12989 bounce:0 kernel_misc_reclaimable:0 free:33365 free_pcp:359 free_cma:0 Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 active_anon:4395976kB inactive_anon:11076848kB active_file:280kB inactive_file:708kB unevictable:416kB isolated(anon):376kB isolated(file):400kB mapped:10612kB dirty:0kB writeback:0kB shmem:114132kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:9504kB pagetables:51956kB all_unreclaimable? yes Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 DMA free:13312kB boost:0kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Mar 16 14:34:01 Inspiron-7370 kernel: lowmem_reserve[]: 0 2402 15652 15652 15652 Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 DMA32 free:63224kB boost:0kB min:10360kB low:12948kB high:15536kB reserved_highatomic:0KB active_anon:1081328kB inactive_anon:1375608kB active_file:0kB inactive_file:24kB unevictable:0kB writepending:0kB present:2593460kB managed:2527300kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Mar 16 14:34:01 Inspiron-7370 kernel: lowmem_reserve[]: 0 0 13250 13250 13250 Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 Normal free:56924kB boost:0kB min:57152kB low:71440kB high:85728kB reserved_highatomic:0KB active_anon:3313952kB inactive_anon:9701320kB active_file:292kB inactive_file:708kB unevictable:416kB writepending:0kB present:13901824kB managed:13576308kB mlocked:288kB bounce:0kB free_pcp:1436kB local_pcp:480kB free_cma:0kB Mar 16 14:34:01 Inspiron-7370 kernel: lowmem_reserve[]: 0 0 0 0 0 Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13312kB Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 DMA32: 188*4kB (UME) 124*8kB (UME) 45*16kB (UME) 15*32kB (UME) 12*64kB (UM) 3*128kB (ME) 29*256kB (UME) 26*512kB (UME) 20*1024kB (UE) 7*2048kB (U) 1*4096kB (U) = 63744kB Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 Normal: 1478*4kB (UME) 2169*8kB (UME) 1034*16kB (UME) 374*32kB (UME) 94*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 57792kB Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 16 14:34:01 Inspiron-7370 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 16 14:34:01 Inspiron-7370 kernel: 78054 total pagecache pages Mar 16 14:34:01 Inspiron-7370 kernel: 49191 pages in swap cache Mar 16 14:34:01 Inspiron-7370 kernel: Swap cache stats: add 5186285, delete 5139503, find 149982/151132 Mar 16 14:34:01 Inspiron-7370 kernel: Free swap = 0kB Mar 16 14:34:01 Inspiron-7370 kernel: Total swap = 2097148kB Mar 16 14:34:01 Inspiron-7370 kernel: 4127817 pages RAM Mar 16 14:34:01 Inspiron-7370 kernel: 0 pages HighMem/MovableOnly Mar 16 14:34:01 Inspiron-7370 kernel: 98075 pages reserved Mar 16 14:34:01 Inspiron-7370 kernel: 0 pages hwpoisoned Mar 16 14:34:01 Inspiron-7370 kernel: Tasks state (memory values in pages): Mar 16 14:34:01 Inspiron-7370 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Mar 16 14:34:01 Inspiron-7370 kernel: [ 265] 0 265 19640 275 167936 159 -250 systemd-journal Mar 16 14:34:01 Inspiron-7370 kernel: [ 294] 0 294 6946 515 73728 261 -1000 systemd-udevd Mar 16 14:34:01 Inspiron-7370 kernel: [ 481] 108 481 4001 159 69632 68 -900 systemd-oomd Mar 16 14:34:01 Inspiron-7370 kernel: [ 482] 104 482 4802 179 73728 177 0 systemd-resolve Mar 16 14:34:01 Inspiron-7370 kernel: [ 486] 101 486 22374 167 73728 68 0 systemd-timesyn Mar 16 14:34:01 Inspiron-7370 kernel: [ 575] 0 575 62082 139 98304 101 0 accounts-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 582] 114 582 2092 71 57344 20 0 avahi-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 583] 0 583 2671 103 57344 44 0 bluetoothd Mar 16 14:34:01 Inspiron-7370 kernel: [ 585] 0 585 4582 53 65536 10 0 cron Mar 16 14:34:01 Inspiron-7370 kernel: [ 587] 102 587 3007 566 57344 149 -900 dbus-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 610] 0 610 20681 87 57344 16 0 irqbalance Mar 16 14:34:01 Inspiron-7370 kernel: [ 617] 0 617 62939 673 110592 327 0 polkitd Mar 16 14:34:01 Inspiron-7370 kernel: [ 618] 0 618 62103 167 98304 44 0 power-profiles- Mar 16 14:34:01 Inspiron-7370 kernel: [ 620] 103 620 55559 83 81920 131 0 rsyslogd Mar 16 14:34:01 Inspiron-7370 kernel: [ 646] 0 646 61216 119 98304 36 0 switcheroo-cont Mar 16 14:34:01 Inspiron-7370 kernel: [ 647] 0 647 12335 254 86016 78 0 systemd-logind Mar 16 14:34:01 Inspiron-7370 kernel: [ 648] 0 648 71365 165 126976 73 0 thermald Mar 16 14:34:01 Inspiron-7370 kernel: [ 650] 0 650 98075 512 126976 296 0 udisksd Mar 16 14:34:01 Inspiron-7370 kernel: [ 665] 114 665 2044 57 49152 34 0 avahi-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 692] 0 692 79272 330 118784 138 0 ModemManager Mar 16 14:34:01 Inspiron-7370 kernel: [ 693] 0 693 67418 641 143360 229 0 NetworkManager Mar 16 14:34:01 Inspiron-7370 kernel: [ 705] 0 705 4338 327 69632 120 0 wpa_supplicant Mar 16 14:34:01 Inspiron-7370 kernel: [ 764] 0 764 30178 1438 131072 602 0 unattended-upgr Mar 16 14:34:01 Inspiron-7370 kernel: [ 790] 0 790 62394 129 102400 176 0 gdm3 Mar 16 14:34:01 Inspiron-7370 kernel: [ 795] 0 795 61383 224 102400 46 0 upowerd Mar 16 14:34:01 Inspiron-7370 kernel: [ 919] 116 919 5685 53 57344 7 0 rtkit-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 1180] 113 1180 3116 34 69632 84 0 kerneloops Mar 16 14:34:01 Inspiron-7370 kernel: [ 1185] 113 1185 3116 88 65536 32 0 kerneloops Mar 16 14:34:01 Inspiron-7370 kernel: [ 1203] 0 1203 76889 523 172032 204 0 packagekitd Mar 16 14:34:01 Inspiron-7370 kernel: [ 1211] 124 1211 64272 1260 118784 469 0 colord Mar 16 14:34:01 Inspiron-7370 kernel: [ 1426] 0 1426 81181 438 110592 27 0 gdm-session-wor Mar 16 14:34:01 Inspiron-7370 kernel: [ 1443] 1000 1443 5127 885 77824 140 100 systemd Mar 16 14:34:01 Inspiron-7370 kernel: [ 1444] 1000 1444 42357 178 90112 871 100 (sd-pam) Mar 16 14:34:01 Inspiron-7370 kernel: [ 1450] 1000 1450 40433 1994 126976 523 200 pipewire Mar 16 14:34:01 Inspiron-7370 kernel: [ 1453] 1000 1453 66882 1017 118784 254 200 wireplumber Mar 16 14:34:01 Inspiron-7370 kernel: [ 1454] 1000 1454 35995 9197 249856 3579 200 pipewire-pulse Mar 16 14:34:01 Inspiron-7370 kernel: [ 1461] 1000 1461 3405 1198 65536 72 200 dbus-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 1462] 1000 1462 62809 211 106496 55 200 gnome-keyring-d Mar 16 14:34:01 Inspiron-7370 kernel: [ 1474] 1000 1474 62267 82 110592 159 200 gvfsd Mar 16 14:34:01 Inspiron-7370 kernel: [ 1476] 1000 1476 136343 166 139264 32 200 xdg-document-po Mar 16 14:34:01 Inspiron-7370 kernel: [ 1480] 1000 1480 95076 17 94208 139 200 gvfsd-fuse Mar 16 14:34:01 Inspiron-7370 kernel: [ 1483] 1000 1483 61132 83 94208 21 200 xdg-permission- Mar 16 14:34:01 Inspiron-7370 kernel: [ 1496] 1000 1496 660 22 40960 0 200 fusermount3 Mar 16 14:34:01 Inspiron-7370 kernel: [ 1514] 1000 1514 42692 0 81920 124 0 gdm-wayland-ses Mar 16 14:34:01 Inspiron-7370 kernel: [ 1517] 1000 1517 58105 431 147456 0 0 gnome-session-b Mar 16 14:34:01 Inspiron-7370 kernel: [ 1560] 1000 1560 158975 1762 294912 459 200 tracker-miner-f Mar 16 14:34:01 Inspiron-7370 kernel: [ 1562] 1000 1562 22068 130 77824 24 200 gcr-ssh-agent Mar 16 14:34:01 Inspiron-7370 kernel: [ 1565] 1000 1565 25062 97 77824 16 200 gnome-session-c Mar 16 14:34:01 Inspiron-7370 kernel: [ 1568] 1000 1568 1893 125 57344 30 200 ssh-agent Mar 16 14:34:01 Inspiron-7370 kernel: [ 1580] 1000 1580 166963 484 204800 254 200 gnome-session-b Mar 16 14:34:01 Inspiron-7370 kernel: [ 1596] 1000 1596 81098 271 114688 113 200 gvfs-udisks2-vo Mar 16 14:34:01 Inspiron-7370 kernel: [ 1618] 1000 1618 77246 93 98304 116 200 at-spi-bus-laun Mar 16 14:34:01 Inspiron-7370 kernel: [ 1619] 1000 1619 1184883 43231 1699840 141 200 gnome-shell Mar 16 14:34:01 Inspiron-7370 kernel: [ 1624] 1000 1624 61253 15 90112 114 200 gvfs-goa-volume Mar 16 14:34:01 Inspiron-7370 kernel: [ 1627] 1000 1627 2328 128 57344 0 200 dbus-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 1631] 1000 1631 140769 0 258048 1679 200 goa-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 1645] 1000 1645 99439 0 118784 297 200 goa-identity-se Mar 16 14:34:01 Inspiron-7370 kernel: [ 1662] 1000 1662 80941 185 110592 110 200 gvfs-afc-volume Mar 16 14:34:01 Inspiron-7370 kernel: [ 1672] 1000 1672 61484 116 94208 61 200 gvfs-gphoto2-vo Mar 16 14:34:01 Inspiron-7370 kernel: [ 1678] 1000 1678 61212 113 94208 35 200 gvfs-mtp-volume Mar 16 14:34:01 Inspiron-7370 kernel: [ 1767] 1000 1767 158043 498 163840 179 200 xdg-desktop-por Mar 16 14:34:01 Inspiron-7370 kernel: [ 1771] 1000 1771 133213 1566 204800 439 200 xdg-desktop-por Mar 16 14:34:01 Inspiron-7370 kernel: [ 1777] 1000 1777 145004 0 172032 679 200 gnome-shell-cal Mar 16 14:34:01 Inspiron-7370 kernel: [ 1783] 1000 1783 294333 1534 335872 416 200 evolution-sourc Mar 16 14:34:01 Inspiron-7370 kernel: [ 1793] 1000 1793 341372 652 286720 214 200 evolution-calen Mar 16 14:34:01 Inspiron-7370 kernel: [ 1808] 1000 1808 272369 659 282624 203 200 evolution-addre Mar 16 14:34:01 Inspiron-7370 kernel: [ 1822] 1000 1822 80821 303 114688 4 200 gvfsd-trash Mar 16 14:34:01 Inspiron-7370 kernel: [ 1837] 1000 1837 40513 176 73728 0 200 at-spi2-registr Mar 16 14:34:01 Inspiron-7370 kernel: [ 1839] 1000 1839 649061 349 233472 833 200 gjs Mar 16 14:34:01 Inspiron-7370 kernel: [ 1850] 1000 1850 684 0 45056 27 200 sh Mar 16 14:34:01 Inspiron-7370 kernel: [ 1852] 1000 1852 79701 118 102400 30 200 gsd-a11y-settin Mar 16 14:34:01 Inspiron-7370 kernel: [ 1853] 1000 1853 80873 805 110592 412 200 ibus-daemon Mar 16 14:34:01 Inspiron-7370 kernel: [ 1854] 1000 1854 87730 1444 167936 3 200 gsd-color Mar 16 14:34:01 Inspiron-7370 kernel: [ 1855] 1000 1855 91429 265 147456 62 200 gsd-datetime Mar 16 14:34:01 Inspiron-7370 kernel: [ 1858] 1000 1858 80070 172 114688 48 200 gsd-housekeepin Mar 16 14:34:01 Inspiron-7370 kernel: [ 1861] 1000 1861 87450 1322 172032 1 200 gsd-keyboard Mar 16 14:34:01 Inspiron-7370 kernel: [ 1863] 1000 1863 167391 1624 208896 46 200 gsd-media-keys Mar 16 14:34:01 Inspiron-7370 kernel: [ 1865] 1000 1865 150694 1601 204800 16 200 gsd-power Mar 16 14:34:01 Inspiron-7370 kernel: [ 1868] 1000 1868 64560 193 118784 171 200 gsd-print-notif Mar 16 14:34:01 Inspiron-7370 kernel: [ 1870] 1000 1870 116560 122 122880 32 200 gsd-rfkill Mar 16 14:34:01 Inspiron-7370 kernel: [ 1872] 1000 1872 61166 87 90112 22 200 gsd-screensaver Mar 16 14:34:01 Inspiron-7370 kernel: [ 1877] 1000 1877 118604 330 139264 84 200 gsd-sharing Mar 16 14:34:01 Inspiron-7370 kernel: [ 1887] 1000 1887 98608 182 118784 44 200 gsd-smartcard Mar 16 14:34:01 Inspiron-7370 kernel: [ 1888] 1000 1888 81913 213 118784 55 200 gsd-sound Mar 16 14:34:01 Inspiron-7370 kernel: [ 1891] 1000 1891 87600 1334 167936 1 200 gsd-wacom Mar 16 14:34:01 Inspiron-7370 kernel: [ 1919] 1000 1919 57920 232 81920 19 200 gsd-disk-utilit Mar 16 14:34:01 Inspiron-7370 kernel: [ 1928] 1000 1928 200174 3016 352256 559 200 evolution-alarm Mar 16 14:34:01 Inspiron-7370 kernel: [ 1942] 1000 1942 42962 37 86016 116 200 ibus-memconf Mar 16 14:34:01 Inspiron-7370 kernel: [ 1946] 1000 1946 89240 2760 184320 138 200 ibus-extension- Mar 16 14:34:01 Inspiron-7370 kernel: [ 1967] 1000 1967 61407 159 98304 0 200 ibus-portal Mar 16 14:34:01 Inspiron-7370 kernel: [ 1985] 1000 1985 87703 479 155648 0 200 gsd-printer Mar 16 14:34:01 Inspiron-7370 kernel: [ 1989] 1000 1989 39096 131 65536 19 200 dconf-service Mar 16 14:34:01 Inspiron-7370 kernel: [ 2012] 1000 2012 42962 154 86016 0 200 ibus-engine-sim Mar 16 14:34:01 Inspiron-7370 kernel: [ 2030] 1000 2030 106664 1835 176128 116 200 xdg-desktop-por Mar 16 14:34:01 Inspiron-7370 kernel: [ 2045] 1000 2045 663381 1259 225280 20 200 gjs Mar 16 14:34:01 Inspiron-7370 kernel: [ 2110] 1000 2110 42862 130 86016 24 200 gvfsd-metadata Mar 16 14:34:01 Inspiron-7370 kernel: [ 2204] 1000 2204 217174 3304 331776 13 200 gnome-calendar Mar 16 14:34:01 Inspiron-7370 kernel: [ 2261] 1000 2261 151381 8322 483328 6 200 Xwayland Mar 16 14:34:01 Inspiron-7370 kernel: [ 2282] 1000 2282 225795 3161 479232 1719 200 gsd-xsettings Mar 16 14:34:01 Inspiron-7370 kernel: [ 2334] 1000 2334 69285 1932 155648 0 200 ibus-x11 Mar 16 14:34:01 Inspiron-7370 kernel: [ 2402] 1000 2402 125614 2007 196608 68 200 update-notifier Mar 16 14:34:01 Inspiron-7370 kernel: [ 2606] 1000 2606 80855 221 106496 2 200 gvfsd-network Mar 16 14:34:01 Inspiron-7370 kernel: [ 2625] 1000 2625 81291 246 118784 0 200 gvfsd-dnssd Mar 16 14:34:01 Inspiron-7370 kernel: [ 5142] 1000 5142 19508 172 131072 262 200 snapd-desktop-i Mar 16 14:34:01 Inspiron-7370 kernel: [ 5341] 1000 5341 95750 1745 200704 314 200 snapd-desktop-i Mar 16 14:34:01 Inspiron-7370 kernel: [ 23028] 0 23028 311068 3363 319488 1238 -900 snapd Mar 16 14:34:01 Inspiron-7370 kernel: [ 31030] 1000 31030 180048 2535 200704 1 200 snap Mar 16 14:34:01 Inspiron-7370 kernel: [ 32316] 1000 32316 751117 17959 1290240 2876 200 gcompris-qt Mar 16 14:34:01 Inspiron-7370 kernel: [ 32644] 0 32644 11592 384 106496 129 0 cupsd Mar 16 14:34:01 Inspiron-7370 kernel: [ 32672] 0 32672 43787 323 106496 104 0 cups-browsed Mar 16 14:34:01 Inspiron-7370 kernel: [ 32935] 1000 32935 9617 1124 98304 672 200 gnome-terminal Mar 16 14:34:01 Inspiron-7370 kernel: [ 32938] 1000 32938 98330 1890 192512 128 200 gnome-terminal. Mar 16 14:34:01 Inspiron-7370 kernel: [ 32943] 1000 32943 144056 5411 299008 657 200 gnome-terminal- Mar 16 14:34:01 Inspiron-7370 kernel: [ 32961] 1000 32961 5070 111 61440 348 200 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 33659] 0 33659 133947 663 331776 22439 0 python3 Mar 16 14:34:01 Inspiron-7370 kernel: [ 34800] 1000 34800 4970 336 65536 37 200 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 35715] 1000 35715 18349 1479 151552 51 200 vim Mar 16 14:34:01 Inspiron-7370 kernel: [ 35720] 1000 35720 5032 347 61440 60 200 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 37308] 0 37308 3713 26 69632 299 -1000 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 37312] 0 37312 4464 269 73728 215 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 37391] 1000 37391 4504 158 73728 385 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 37392] 1000 37392 4987 346 57344 38 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 37706] 1000 37706 5578 210 65536 3 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 37707] 1000 37707 5578 202 57344 9 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 37708] 0 37708 5001 388 53248 2 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 39416] 0 39416 4465 412 69632 69 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 39454] 1000 39454 4505 284 69632 260 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 39455] 1000 39455 4987 227 57344 151 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 39465] 1000 39465 5586 183 61440 23 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 39466] 1000 39466 5586 184 49152 28 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 39467] 0 39467 5067 410 53248 34 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 39488] 0 39488 4642 266 53248 33 0 watch Mar 16 14:34:01 Inspiron-7370 kernel: [ 42387] 0 42387 4465 415 77824 66 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 42436] 1000 42436 4505 287 77824 258 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 42437] 1000 42437 4987 293 61440 85 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 42461] 1000 42461 5587 182 65536 25 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 42469] 1000 42469 5587 193 53248 20 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 42470] 0 42470 5001 324 69632 36 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 42496] 0 42496 18962 241 159744 6 0 journalctl Mar 16 14:34:01 Inspiron-7370 kernel: [ 43168] 0 43168 4465 434 81920 47 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 43217] 1000 43217 4505 369 81920 199 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 43218] 1000 43218 4987 384 61440 0 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 46679] 0 46679 3930 3 73728 207 0 systemd-inhibit Mar 16 14:34:01 Inspiron-7370 kernel: [ 46683] 0 46683 9134 1 94208 2672 0 python3 Mar 16 14:34:01 Inspiron-7370 kernel: [ 47245] 1000 47245 5534 278 65536 0 0 top Mar 16 14:34:01 Inspiron-7370 kernel: [ 218754] 0 218754 4465 437 69632 44 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 218841] 1000 218841 4505 313 69632 232 0 sshd Mar 16 14:34:01 Inspiron-7370 kernel: [ 218842] 1000 218842 4987 279 57344 105 0 bash Mar 16 14:34:01 Inspiron-7370 kernel: [ 219027] 1000 219027 5579 167 69632 31 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 219028] 1000 219028 5579 166 57344 37 0 sudo Mar 16 14:34:01 Inspiron-7370 kernel: [ 219029] 0 219029 17818 186 143360 36 0 journalctl Mar 16 14:34:01 Inspiron-7370 kernel: [ 226624] 0 226624 6558 24 65536 27 -1000 stress-ng Mar 16 14:34:01 Inspiron-7370 kernel: [ 226625] 0 226625 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226626] 0 226626 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226627] 0 226627 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226628] 0 226628 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226629] 0 226629 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226630] 0 226630 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226631] 0 226631 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226632] 0 226632 493023 475577 3964928 10929 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226633] 0 226633 6558 13 57344 30 -1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226634] 0 226634 510767 504051 4108288 195 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226635] 0 226635 536111 467684 4313088 61909 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226636] 0 226636 792991 627087 6369280 159386 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226637] 0 226637 490111 483309 3948544 269 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226638] 0 226638 399775 307418 3215360 85840 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226639] 0 226639 6558 18 45056 25 -1000 stress-ng-ignit Mar 16 14:34:01 Inspiron-7370 kernel: [ 226640] 0 226640 393327 314975 3166208 71835 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226644] 0 226644 454575 448018 3657728 39 1000 stress-ng-bighe Mar 16 14:34:01 Inspiron-7370 kernel: [ 226672] 0 226672 1543 38 45056 0 0 free Mar 16 14:34:01 Inspiron-7370 kernel: [ 226673] 0 226673 4465 28 57344 0 0 grep Mar 16 14:34:01 Inspiron-7370 kernel: [ 226674] 0 226674 928 23 40960 0 0 awk Mar 16 14:34:01 Inspiron-7370 kernel: [ 226675] 0 226675 4642 263 40960 36 0 watch Mar 16 14:34:01 Inspiron-7370 kernel: [ 226676] 0 226676 684 24 40960 2 0 sh Mar 16 14:34:01 Inspiron-7370 kernel: [ 226677] 0 226677 505 15 32768 0 0 systemctl Mar 16 14:34:01 Inspiron-7370 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/snap.checkbox.service.service,task=stress-ng-bighe,pid=226636,uid=0 Mar 16 14:34:01 Inspiron-7370 kernel: Out of memory: Killed process 226636 (stress-ng-bighe) total-vm:3171964kB, anon-rss:2508344kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:6220kB oom_score_adj:1000 Mar 16 14:34:00 Inspiron-7370 systemd[1]: snap.checkbox.service.service: A process of this unit has been killed by the OOM killer. Mar 16 14:34:01 Inspiron-7370 stress-ng[226628]: memory (MB): total 15741.18, free 11428.08, shared 110.69, buffer 2.46, swap 2048.00, free swap 1510.82 Mar 16 14:34:01 Inspiron-7370 systemd[1]: snap.checkbox.service.service: Failed with result 'oom-kill'. Mar 16 14:34:01 Inspiron-7370 systemd[1]: snap.checkbox.service.service: Consumed 6h 44min 35.908s CPU time. Mar 16 14:34:01 Inspiron-7370 systemd[1]: snap.checkbox.service.service: Scheduled restart job, restart counter is at 1. Mar 16 14:34:01 Inspiron-7370 systemd[1]: Stopped Service for snap application checkbox.service. Mar 16 14:34:01 Inspiron-7370 systemd[1]: snap.checkbox.service.service: Consumed 6h 44min 35.908s CPU time. Mar 16 14:34:01 Inspiron-7370 systemd[1]: Started Service for snap application checkbox.service. Mar 16 14:34:12 Inspiron-7370 checkbox.service[226680]: $PROVIDERPATH is defined, so following provider sources are ignored ['/root/.local/share/plainbox-providers-1', '/var/tmp/checkbox-providers-develop'] ```
pieqq commented 1 year ago

Used checkbox remote to connect to a focal laptop with kernel 5.15.0-67-generic, and ran the com.canonical.certification::20.04-server-full test plan, selecting only memory/memory_stress_ng:

During the test, I can see this with systemctl status snap.checkbox.service.service:

● snap.checkbox.service.service - Service for snap application checkbox.service
     Loaded: loaded (/etc/systemd/system/snap.checkbox.service.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-03-16 14:58:18 CST; 16min ago
   Main PID: 80976 (python3)
      Tasks: 19 (limit: 14068)
     Memory: 3.6G
     CGroup: /system.slice/snap.checkbox.service.service
             ├─80976 python3 /snap/checkbox20/current/bin/checkbox-cli service
             ├─84689 systemd-inhibit stress_ng_test.py memory
             ├─84693 python3 /tmp/nest-fkzeb7t7.e1013b014ee4ce382e5a1291c27c2890a7088a1741917d4aa1a85269eb1399fe/stress_ng_test.py memory
             ├─85172 stress-ng --aggressive --verify --timeout 300 --context 0
             ├─85173 stress-ng-context [run]
             ├─85174 stress-ng-context [run]
             ├─85175 stress-ng-context [run]
             ├─85176 stress-ng-context [run]
             ├─85177 stress-ng-context [run]
             ├─85178 stress-ng-context [run]
             ├─85179 stress-ng-context [run]
             ├─85180 stress-ng-context [run]
             └─85181 stress-ng-ignite [periodic]

At some point, the laptop became really unresponsive and I thought it was dead, but when I came back later, the test had finished. So things look OK on focal, using Checkbox remote. Focal does not use systemd-oomd...

Checkbox remote output ``` (...) ----------------------------[ Running job 46 / 46 ]----------------------------- ------------------------[ Stress test of system memory ]------------------------ ID: com.canonical.certification::memory/memory_stress_ng Category: Memory tests -------------------------------------------------------------------------------- stress-ng: info: [84697] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [84697] dispatching hogs: 8 bsearch stress-ng: info: [84697] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [85172] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [85172] dispatching hogs: 8 context stress-ng: info: [85172] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [85635] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [85635] dispatching hogs: 8 hsearch stress-ng: info: [85635] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [86101] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [86101] dispatching hogs: 8 lsearch stress-ng: info: [86101] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [86562] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [86562] dispatching hogs: 8 matrix stress-ng: info: [86562] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [87032] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [87032] dispatching hogs: 8 memcpy stress-ng: info: [87032] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [87496] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [87496] dispatching hogs: 8 null stress-ng: info: [87496] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [87960] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [87960] dispatching hogs: 8 pipe stress-ng: info: [87960] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [88429] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [88429] dispatching hogs: 8 qsort stress-ng: info: [88429] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [88900] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [88900] dispatching hogs: 8 str stress-ng: info: [88900] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [89360] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [89360] dispatching hogs: 8 stream stress-ng: info: [89362] stream: stressor loosely based on a variant of the STREAM benchmark code stress-ng: info: [89362] stream: do NOT submit any of these results to the STREAM benchmark results stress-ng: info: [89362] stream: Using CPU cache size of 6144K stress-ng: info: [89366] stream: memory rate: 1602.68 MB read/sec, 1068.45 MB write/sec, 140.04 Mflop/sec (instance 4) stress-ng: info: [89363] stream: memory rate: 2061.49 MB read/sec, 1374.32 MB write/sec, 180.14 Mflop/sec (instance 1) stress-ng: info: [89369] stream: memory rate: 1157.45 MB read/sec, 771.63 MB write/sec, 101.14 Mflop/sec (instance 7) stress-ng: info: [89362] stream: memory rate: 2222.40 MB read/sec, 1481.60 MB write/sec, 194.20 Mflop/sec (instance 0) stress-ng: info: [89368] stream: memory rate: 1297.85 MB read/sec, 865.24 MB write/sec, 113.41 Mflop/sec (instance 6) stress-ng: info: [89365] stream: memory rate: 1772.53 MB read/sec, 1181.69 MB write/sec, 154.89 Mflop/sec (instance 3) stress-ng: info: [89364] stream: memory rate: 1922.56 MB read/sec, 1281.71 MB write/sec, 168.00 Mflop/sec (instance 2) stress-ng: info: [89367] stream: memory rate: 1452.17 MB read/sec, 968.11 MB write/sec, 126.89 Mflop/sec (instance 5) stress-ng: info: [89360] successful run completed in 300.11s (5 mins, 0.11 secs) stress-ng: info: [89822] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [89822] dispatching hogs: 8 tsearch stress-ng: info: [89822] successful run completed in 300.13s (5 mins, 0.13 secs) stress-ng: info: [90284] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [90284] dispatching hogs: 8 vm-rw stress-ng: info: [90284] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [90751] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [90751] dispatching hogs: 8 wcs stress-ng: info: [90751] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [91216] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [91216] dispatching hogs: 8 zero stress-ng: info: [91216] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [91680] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [91680] dispatching hogs: 8 mlock stress-ng: info: [91680] successful run completed in 300.67s (5 mins, 0.67 secs) stress-ng: info: [92149] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [92149] dispatching hogs: 8 mmapfork stress-ng: info: [92149] successful run completed in 300.51s (5 mins, 0.51 secs) stress-ng: info: [181538] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [181538] dispatching hogs: 8 mmapmany stress-ng: info: [181538] successful run completed in 300.55s (5 mins, 0.55 secs) stress-ng: info: [182005] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [182005] dispatching hogs: 8 mremap stress-ng: warn: [182005] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: info: [182005] successful run completed in 300.16s (5 mins, 0.16 secs) stress-ng: info: [191135] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [191135] dispatching hogs: 8 shm-sysv stress-ng: info: [191135] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [464834] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [464834] dispatching hogs: 8 vm-splice stress-ng: info: [464834] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [465297] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [465297] dispatching hogs: 8 malloc stress-ng: info: [465297] successful run completed in 415.05s (6 mins, 55.05 secs) stress-ng: info: [465934] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [465934] dispatching hogs: 8 mincore stress-ng: info: [465934] successful run completed in 415.00s (6 mins, 55.00 secs) stress-ng: info: [466569] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [466569] dispatching hogs: 8 vm stress-ng: info: [466569] successful run completed in 415.18s (6 mins, 55.18 secs) stress-ng: info: [467221] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [467221] dispatching hogs: 8 mmap stress-ng: info: [467221] successful run completed in 415.65s (6 mins, 55.65 secs) stress-ng: info: [467869] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [467869] dispatching hogs: 8 stack stress-ng: info: [467869] successful run completed in 415.02s (6 mins, 55.02 secs) stress-ng: info: [468511] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [468511] dispatching hogs: 8 bigheap Reconnecting... Reconnecting... (...) Reconnecting... Connection lost! connection closed by peer Rejoined session. In progress: com.canonical.certification::memory/memory_stress_ng (46/46) stress-ng: info: [468739] setting to a 415 second (6 mins, 55.00 secs) run per stressor stress-ng: info: [468739] dispatching hogs: 8 brk Reconnecting... Reconnecting... (...) Reconnecting... Connection lost! connection closed by peer Rejoined session. In progress: com.canonical.certification::memory/memory_stress_ng (46/46) stress-ng: info: [468739] successful run completed in 10592.10s (2 hours, 56 mins, 32.10 secs) Minimum swap space is set to 0 GiB Total memory is 11.6 GiB Constant run time is 300 seconds per stressor Variable run time is 416 seconds per stressor Number of NUMA nodes is 1 Estimated total run time is 133 minutes 16 Mar 07:07: Running stress-ng bsearch stressor for 300 seconds... 16 Mar 07:12: Running stress-ng context stressor for 300 seconds... 16 Mar 07:17: Running stress-ng hsearch stressor for 300 seconds... 16 Mar 07:22: Running stress-ng lsearch stressor for 300 seconds... 16 Mar 07:27: Running stress-ng matrix stressor for 300 seconds... 16 Mar 07:32: Running stress-ng memcpy stressor for 300 seconds... 16 Mar 07:37: Running stress-ng null stressor for 300 seconds... 16 Mar 07:42: Running stress-ng pipe stressor for 300 seconds... 16 Mar 07:47: Running stress-ng qsort stressor for 300 seconds... 16 Mar 07:52: Running stress-ng str stressor for 300 seconds... 16 Mar 07:57: Running stress-ng stream stressor for 300 seconds... 16 Mar 08:02: Running stress-ng tsearch stressor for 300 seconds... 16 Mar 08:07: Running stress-ng vm-rw stressor for 300 seconds... 16 Mar 08:12: Running stress-ng wcs stressor for 300 seconds... 16 Mar 08:17: Running stress-ng zero stressor for 300 seconds... 16 Mar 08:22: Running stress-ng mlock stressor for 300 seconds... 16 Mar 08:27: Running stress-ng mmapfork stressor for 300 seconds... 16 Mar 08:32: Running stress-ng mmapmany stressor for 300 seconds... 16 Mar 08:37: Running stress-ng mremap stressor for 300 seconds... 16 Mar 08:42: Running stress-ng shm-sysv stressor for 300 seconds... 16 Mar 08:47: Running stress-ng vm-splice stressor for 300 seconds... 16 Mar 08:52: Running stress-ng malloc stressor for 416 seconds... 16 Mar 08:59: Running stress-ng mincore stressor for 416 seconds... 16 Mar 09:06: Running stress-ng vm stressor for 416 seconds... 16 Mar 09:13: Running stress-ng mmap stressor for 416 seconds... 16 Mar 09:20: Running stress-ng stack stressor for 416 seconds... 16 Mar 09:27: Running stress-ng bigheap stressor for 416 seconds... ** stress-ng timed out and was forcefully terminated 16 Mar 11:03: Running stress-ng brk stressor for 416 seconds... ** stress-ng timed out and was forcefully terminated retval is 1 ************************************************************** ** stress-ng test failed! ************************************************************** stress_ng_test.py failed with exit status 1. -------------------------------------------------------------------------------- Outcome: job failed ==================================[ Results ]=================================== 32.0kB [00:00, 4.56MB/s, file=python://stdout] job passed : Enumerate available system executables job failed : Run FWTS Server Cert selected tests. job passed : Collect information about installed software packages job passed : miscellanea/apport-directory job passed : Collect information about the CPU job failed : Gather BMC identification info job passed : Collect information about installed system (lsb-release) job passed : Test that system is not a pre-release version job passed : Attempt to identify CPU family (x86/amd64 only) job passed : Resource to detect if dmi data is present job passed : Test DMI data for CPUs job passed : Check the MD5 sums of installed Debian packages job failed : Test DMI identification data (servers) job passed : Test that system booted in EFI mode job failed : Test that system booted from the network job passed : Gather info on the SUT's make and model job failed : Verify MAAS version used to deploy the SUT job failed : Test IPMI in-band communications job passed : Test that kernel is not tainted job passed : Verify BMC user called 'maas' was successfully created job passed : Test that system supports booting into firmware setup utility job failed : Test that system booted with Secure Boot active job cannot be started: Generate baseline sosreport job cannot be started: Attach the baseline sosreport file job passed : Collect information about the running kernel job passed : Attach dump of udev database job passed : Attach info block devices and their mount points job passed : Collect information about hardware devices (udev) job passed : Attach PCI configuration space hex dump job passed : Attach the contents of /etc/modprobe.* job passed : Attach a copy of /sys/class/dmi/id/* job passed : Provide links to requirements documents job passed : Collect information about installation media (casper) job passed : Collect information about system memory (/proc/meminfo) job passed : Attaches json dumps of udev_resource.py job passed : Collect information about installed snap packages job passed : Collect information about kernel modules job passed : Attaches json dumps of raw dmi devices job passed : Collect information about dpkg version job passed : Attach a copy of /proc/cmdline job passed : Collect information about hardware devices (DMI) job passed : Attach detailed sysfs property output from udev job passed : Create resource info for environment variables job passed : Collect information about the EFI configuration job passed : Attaches json dumps of installed dkms package information. job passed : Attaches json dumps of system info tools job passed : Check that data for a complete result are present job failed : Stress test of system memory 1.84MB [00:00, 7.65MB/s, file=/home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.html] file:///home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.html 432kB [00:00, 6.19MB/s, file=/home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.junit.xml] file:///home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.junit.xml 368kB [00:00, 6.04MB/s, file=/home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.tar.xz] file:///home/pieq/.local/share/checkbox-ng/submission_2023-03-17T05.50.27.354524.tar.xz ```
pieqq commented 1 year ago

I've also ran the same test as in a previous comment on the same laptop, but this time, in local mode. checkbox-cli was killed after 150 minutes of running memory/memory_stress_ng:

Output of checkbox-cli local running `memory/memory_stress_ng` ``` -------------[ Running job 46 / 46. Estimated time left: unknown ]-------------- ------------------------[ Stress test of system memory ]------------------------ ID: com.canonical.certification::memory/memory_stress_ng Category: com.canonical.plainbox::memory ... 8< ------------------------------------------------------------------------- stress-ng: info: [235282] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235282] dispatching hogs: 4 bsearch stress-ng: info: [235282] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [235304] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235304] dispatching hogs: 4 context stress-ng: info: [235304] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235321] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235321] dispatching hogs: 4 hsearch stress-ng: info: [235321] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235329] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235329] dispatching hogs: 4 lsearch stress-ng: info: [235329] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235339] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235339] dispatching hogs: 4 matrix stress-ng: info: [235339] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235350] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235350] dispatching hogs: 4 memcpy stress-ng: info: [235350] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [235358] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235358] dispatching hogs: 4 null stress-ng: info: [235358] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235367] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235367] dispatching hogs: 4 pipe stress-ng: info: [235367] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235384] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235384] dispatching hogs: 4 qsort stress-ng: info: [235384] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235393] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235393] dispatching hogs: 4 str stress-ng: info: [235393] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235401] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235401] dispatching hogs: 4 stream stress-ng: info: [235403] stream: stressor loosely based on a variant of the STREAM benchmark code stress-ng: info: [235403] stream: do NOT submit any of these results to the STREAM benchmark results stress-ng: info: [235403] stream: Using CPU cache size of 4096K stress-ng: info: [235404] stream: memory rate: 4010.91 MB read/sec, 2673.94 MB write/sec, 350.48 Mflop/sec (instance 1) stress-ng: info: [235405] stream: memory rate: 3367.22 MB read/sec, 2244.82 MB write/sec, 294.23 Mflop/sec (instance 2) stress-ng: info: [235406] stream: memory rate: 2819.58 MB read/sec, 1879.72 MB write/sec, 246.38 Mflop/sec (instance 3) stress-ng: info: [235403] stream: memory rate: 4645.56 MB read/sec, 3097.04 MB write/sec, 405.93 Mflop/sec (instance 0) stress-ng: info: [235401] successful run completed in 300.10s (5 mins, 0.10 secs) stress-ng: info: [235410] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235410] dispatching hogs: 4 tsearch stress-ng: info: [235410] successful run completed in 300.04s (5 mins, 0.04 secs) stress-ng: info: [235419] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235419] dispatching hogs: 4 vm-rw stress-ng: info: [235419] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235432] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235432] dispatching hogs: 4 wcs stress-ng: info: [235432] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235440] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235440] dispatching hogs: 4 zero stress-ng: info: [235440] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [235450] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235450] dispatching hogs: 4 mlock stress-ng: info: [235450] successful run completed in 305.48s (5 mins, 5.48 secs) stress-ng: info: [235569] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [235569] dispatching hogs: 4 mmapfork stress-ng: info: [235569] successful run completed in 301.52s (5 mins, 1.52 secs) stress-ng: info: [249672] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [249672] dispatching hogs: 4 mmapmany stress-ng: info: [249672] successful run completed in 300.40s (5 mins, 0.40 secs) stress-ng: info: [249684] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [249684] dispatching hogs: 4 mremap stress-ng: warn: [249684] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: info: [249684] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [253484] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [253484] dispatching hogs: 4 shm-sysv stress-ng: info: [253484] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [376187] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [376187] dispatching hogs: 4 vm-splice stress-ng: info: [376187] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [376197] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376197] dispatching hogs: 4 malloc stress-ng: info: [376197] successful run completed in 453.05s (7 mins, 33.05 secs) stress-ng: info: [376211] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376211] dispatching hogs: 4 mincore stress-ng: info: [376211] successful run completed in 453.00s (7 mins, 33.00 secs) stress-ng: info: [376232] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376232] dispatching hogs: 4 vm stress-ng: info: [376232] successful run completed in 453.01s (7 mins, 33.01 secs) stress-ng: info: [376249] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376249] dispatching hogs: 4 mmap stress-ng: info: [376249] successful run completed in 453.07s (7 mins, 33.07 secs) stress-ng: info: [376269] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376269] dispatching hogs: 8 stack stress-ng: info: [376269] successful run completed in 453.03s (7 mins, 33.03 secs) stress-ng: info: [376357] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [376357] dispatching hogs: 8 bigheap Killed real 150m35.050s user 3m42.652s sys 2m43.482s ```
pieqq commented 1 year ago

Running 22.04 Server Full without memory/memory_stress_ng on the same device as in this comment using the same version of checkbox remote (2.1), I can complete the run. Submission is available here..

During the run, the cgroup shown in systemctl status snap.checkbox.service.service was the same:

     CGroup: /system.slice/snap.checkbox.service.service
             └─226680 python3 /snap/checkbox22/current/bin/checkbox-cli service

So it probably has to do with:

Maybe it could be worth trying the solution brought by QA team in #297 in the server test plan and see if that improves the situation. @bladernr if you have time to do that today, please go ahead, otherwise I'll try that on Monday.

Edit: I've launched the 22.04 Desktop Stress Test Plan which includes the fix made in #297 on the same device. We'll see in a few hours what happens...

pieqq commented 1 year ago

Argh, of course the session hit #22 because there are some poweroff and reboot stress tests, so I lost the session.

Restarting the same without any poweroff/reboot stress tests.

pieqq commented 1 year ago

Unfortunately, I also hit #22 when running memory_stress_ng, when the bigheap stressor is in use:

Session dropping after running the bigheap stressor in memory_stress_ng job ``` ----------------------------[ Running job 37 / 42 ]----------------------------- ------------------------[ Stress test of system memory ]------------------------ ID: com.canonical.certification::memory/memory_stress_ng Category: Memory tests -------------------------------------------------------------------------------- stress-ng: info: [6125] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6125] dispatching hogs: 4 bsearch stress-ng: info: [6125] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [6134] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6134] dispatching hogs: 4 context stress-ng: info: [6134] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6142] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6142] dispatching hogs: 4 hsearch stress-ng: info: [6142] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [6150] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6150] dispatching hogs: 4 lsearch stress-ng: info: [6150] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6159] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6159] dispatching hogs: 4 matrix stress-ng: info: [6159] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6168] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6168] dispatching hogs: 4 memcpy stress-ng: info: [6168] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [6180] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6180] dispatching hogs: 4 null stress-ng: info: [6180] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6188] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6188] dispatching hogs: 4 pipe stress-ng: info: [6188] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6201] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6201] dispatching hogs: 4 qsort stress-ng: info: [6201] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6215] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6215] dispatching hogs: 4 str stress-ng: info: [6215] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6223] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6223] dispatching hogs: 4 stream stress-ng: info: [6225] stream: stressor loosely based on a variant of the STREAM benchmark code stress-ng: info: [6225] stream: do NOT submit any of these results to the STREAM benchmark results stress-ng: info: [6225] stream: Using CPU cache size of 4096K stress-ng: info: [6227] stream: memory rate: 3346.30 MB read/sec, 2230.87 MB write/sec, 292.40 Mflop/sec (instance 2) stress-ng: info: [6226] stream: memory rate: 3828.40 MB read/sec, 2552.27 MB write/sec, 334.53 Mflop/sec (instance 1) stress-ng: info: [6228] stream: memory rate: 2911.28 MB read/sec, 1940.86 MB write/sec, 254.39 Mflop/sec (instance 3) stress-ng: info: [6225] stream: memory rate: 4361.00 MB read/sec, 2907.33 MB write/sec, 381.07 Mflop/sec (instance 0) stress-ng: info: [6223] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [6233] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6233] dispatching hogs: 4 tsearch stress-ng: info: [6233] successful run completed in 300.08s (5 mins, 0.08 secs) stress-ng: info: [6240] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6240] dispatching hogs: 4 vm-rw stress-ng: info: [6240] successful run completed in 300.01s (5 mins, 0.01 secs) stress-ng: info: [6268] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6268] dispatching hogs: 4 wcs stress-ng: info: [6268] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6281] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6281] dispatching hogs: 4 zero stress-ng: info: [6281] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [6392] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6392] dispatching hogs: 4 mlock stress-ng: info: [6392] successful run completed in 300.61s (5 mins, 0.61 secs) stress-ng: info: [6407] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [6407] dispatching hogs: 4 mmapfork stress-ng: info: [6407] successful run completed in 302.16s (5 mins, 2.16 secs) stress-ng: info: [20624] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [20624] dispatching hogs: 4 mmapmany stress-ng: info: [20624] successful run completed in 300.71s (5 mins, 0.71 secs) stress-ng: info: [20640] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [20640] dispatching hogs: 4 mremap stress-ng: warn: [20640] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: info: [20640] successful run completed in 300.02s (5 mins, 0.02 secs) stress-ng: info: [24304] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [24304] dispatching hogs: 4 shm-sysv stress-ng: info: [24304] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [138673] setting to a 300 second (5 mins, 0.00 secs) run per stressor stress-ng: info: [138673] dispatching hogs: 4 vm-splice stress-ng: info: [138673] successful run completed in 300.00s (5 mins, 0.00 secs) stress-ng: info: [138685] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138685] dispatching hogs: 4 malloc stress-ng: info: [138685] successful run completed in 453.06s (7 mins, 33.06 secs) stress-ng: info: [138699] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138699] dispatching hogs: 4 mincore stress-ng: info: [138699] successful run completed in 453.00s (7 mins, 33.00 secs) stress-ng: info: [138710] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138710] dispatching hogs: 4 vm stress-ng: info: [138710] successful run completed in 453.01s (7 mins, 33.01 secs) stress-ng: info: [138724] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138724] dispatching hogs: 4 mmap stress-ng: info: [138724] successful run completed in 453.12s (7 mins, 33.12 secs) stress-ng: info: [138738] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138738] dispatching hogs: 8 stack stress-ng: info: [138738] successful run completed in 453.03s (7 mins, 33.03 secs) stress-ng: info: [138759] setting to a 453 second (7 mins, 33.00 secs) run per stressor stress-ng: info: [138759] dispatching hogs: 8 bigheap Connection lost! [Errno 104] Connection reset by peer Reconnecting... Reconnecting... ```
bladernr commented 1 year ago

just FYI in case it was missed, this is the launcher i'm using:

https://github.com/canonical/checkbox/files/10902924/timing-test-launcher.txt

Also, it's stress-ng in general, I think. I'm just using memory-stress as an arbitrary choice (and it was the shortest run one on systems with smaller footprint since the runtime depends on the amount of RAM installed, and isn't necessarily dependent on the number of CPU cores) FWIW here's one of the small systems (small older server)

ubuntu@mayapple:~$ ps -p 2520196 -o pid,etime,cmd PID ELAPSED CMD 2520196 13-03:57:01 /usr/bin/time -f %E checkbox-cli remote 127.0.0.1 ./timing-test-launcher

GAH.…. It is definitely because of OOM killer... I finally noticed the pattern in the syslog entries...

Mar 19 00:24:16 mayapple systemd[1]: checkbox-ng.service: Consumed 13h 21min 11.802s CPU time. Mar 19 00:24:16 mayapple systemd[1]: system.slice: A process of this unit has been killed by the OOM killer. Mar 19 00:24:17 mayapple systemd[1]: checkbox-ng.service: Scheduled restart job, restart counter is at 218. Mar 19 00:24:17 mayapple systemd[1]: Stopped Checkbox Remote Service. Mar 19 00:24:17 mayapple systemd[1]: checkbox-ng.service: Consumed 13h 21min 11.802s CPU time. Mar 19 00:24:18 mayapple systemd[1]: Started Checkbox Remote Service. Mar 19 00:24:24 mayapple checkbox-ng.service[2622755]: normal_user not supplied via config(s). Mar 19 00:24:24 mayapple checkbox-ng.service[2622755]: Using ubuntu user Mar 19 00:24:32 mayapple stress-ng: invoked with 'stress-ng --aggressive --verify --timeout 300 --bsearch 0' by user 0 'root' Mar 19 00:24:32 mayapple stress-ng: system: 'mayapple' Linux 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 Mar 19 00:24:32 mayapple stress-ng: memory (MB): total 5930.07, free 5301.74, shared 3.06, buffer 10.57, swap 4096.00, free swap 3945.07

Then just an hour or so later: Mar 19 01:40:00 mayapple systemd[1]: checkbox-ng.service: A process of this unit has been killed by the OOM killer. Mar 19 01:40:00 mayapple systemd[1]: checkbox-ng.service: Killing process 2622755 (checkbox-cli) with signal SIGKILL. Mar 19 01:40:00 mayapple systemd[1]: checkbox-ng.service: Failed with result 'oom-kill'. Mar 19 01:40:00 mayapple systemd[1]: checkbox-ng.service: Consumed 13h 21min 19.980s CPU time. Mar 19 01:40:00 mayapple systemd[1]: system.slice: A process of this unit has been killed by the OOM killer. Mar 19 01:40:02 mayapple systemd[1]: checkbox-ng.service: Scheduled restart job, restart counter is at 219. Mar 19 01:40:02 mayapple systemd[1]: Stopped Checkbox Remote Service. Mar 19 01:40:02 mayapple systemd[1]: checkbox-ng.service: Consumed 13h 21min 19.980s CPU time. Mar 19 01:40:03 mayapple systemd[1]: Started Checkbox Remote Service. Mar 19 01:40:10 mayapple checkbox-ng.service[2623293]: normal_user not supplied via config(s). Mar 19 01:40:10 mayapple checkbox-ng.service[2623293]: Using ubuntu user Mar 19 01:40:18 mayapple stress-ng: invoked with 'stress-ng --aggressive --verify --timeout 300 --bsearch 0' by user 0 'root' Mar 19 01:40:18 mayapple stress-ng: system: 'mayapple' Linux 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 Mar 19 01:40:18 mayapple stress-ng: memory (MB): total 5930.07, free 5287.66, shared 3.03, buffer 10.55, swap 4096.00, free swap 3946.59

and on and on... SO... OOMkiller is killing checkbox, whcih restarts and starts running the stress test again from scratch since it's not stateful. THAT is why this is going on and on, it's in an unending loop.

So ideas:

I think this may deserve also filing bugs against systemd-oomd perhaps...

The first one is probably the path of least resistance...

bladernr commented 1 year ago

Also, I hate that it has taken me this long to finally see that pattern... my eyes have been going blurry every time I look at the logs and output from these runs (that and they take so very long that I get lost in context switching...).

pieqq commented 1 year ago

Figure out how to fix the systemd-oomd crap on server

@bladernr just to confirm, the logs in your comment come from a 20.04 server running checkbox remote?

The fix made by QA team a few weeks ago would not cover 20.04 as you mentioned earlier, but it might be worth trying it with 22.04.

Come to think of it, the stress tests should have a soft dependency on stress/store_and_change_oomd_config (that is, their job should use after: stress/store_and_change_oomd_config to execute this job first and proceed, regardless of the outcome of stress/store_and_change_oomd_config). This way the jobs would still work on 20.04.

pieqq commented 1 year ago

So, to summarize:

zongminl commented 1 year ago

I'm not sure if the jobs stress/store_and_change_oomd_config would work on a server image, from what I can recall there's no /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf exists in the jammy server image

One experiment I think you can try is to run stress-ng-automated test plan and show us the result, there's a change in the test plan recently and it's breaking down the stress-ng jobs from class based jobs to stressor based jobs, by using that way we've already observed an improvement on preventing the checkbox running on low profile IoT devices being killed by kernel oom-killer (not the systemd-oomd in desktop image since we're running on server and core images)

bladernr commented 1 year ago

FYI on server we already run on stressor based jobs and has been the case since the beginning. The script that runs all the stress-ng jobs for memory, CPU, and storage all runs stressors either individually/sequentially in memory and storage, or in parallel in CPU stress. But we don't really call the "class" stressor groups anyway.

I did find time to try the code in those jobs that modify systemd-oomd and confirm that the necessary files are not present on servers... both 20.04 and 22.04 I'm currently running the following on 4 different servers: Prunus: tryign 20.04+HWE to verify this happens there too mayapple: 22.04 GA and modifying vm.oom_kill_allocating_task to see if changing that to non-zero will help here makrutlime: 20.04 GA because I need to see what the config differences are between 5.4 and 5.15 (maybe the vm.oom* options are different? gurley: 22.04 - gonna try a suggestion from Server to see if installing systemd-oomd on server and then running those oomd modification jobs has any effect.

syncronize-issues-to-jira[bot] commented 10 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1112.

This message was autogenerated