Closed mreed8855 closed 1 month ago
Initially mlock, mremap, shm-sysv, vm-splice, numa, malloc failed
03 Sep 07:34: Running stress-ng mlock stressor for 300 seconds... ** stress-ng timed out and was forcefully terminated
03 Sep 11:50: Running stress-ng mremap stressor for 300 seconds... ** stress-ng timed out and was forcefully terminated
03 Sep 12:00: Running stress-ng shm-sysv stressor for 300 seconds... ** stress-ng timed out and was forcefully terminated
03 Sep 12:10: Running stress-ng vm-splice stressor for 300 seconds... ** stress-ng timed out and was forcefully terminated
03 Sep 12:20: Running stress-ng numa stressor for 300 seconds... ** stress-ng timed out and was forcefully terminated
-03 Sep 12:30: Running stress-ng malloc stressor for 9115 seconds... ** stress-ng timed out and was forcefully terminated
However, after doubling, tripling and quadrupling the 300 second timeout malloc is the only stressors with an issue.
After increasing the timeout
02 Sep 12:30: Running stress-ng malloc stressor for 9115 seconds... ** stress-ng exited with code 3 stress-ng: info: [964793] setting to a 2 hours, 31 mins, 54 secs run per stressor stress-ng: info: [964793] dispatching hogs: 512 malloc stress-ng: info: [965806] malloc: failed to create counter lock. skipping stressor stress-ng: info: [965809] malloc: failed to create counter lock. skipping stressor stress-ng: info: [965811] malloc: failed to create counter lock. skipping stressor stress-ng: info: [965810] malloc: failed to create counter lock. skipping stressor stress-ng: info: [965812] malloc: failed to create counter lock. skipping stressor stress-ng: warn: [964793] malloc: [965809] aborted early, out of system resources stress-ng: warn: [964793] malloc: [965810] aborted early, out of system resources stress-ng: warn: [964793] malloc: [965811] aborted early, out of system resources stress-ng: warn: [964793] malloc: [965812] aborted early, out of system resources stress-ng: info: [964793] skipped: 4: malloc (4) stress-ng: info: [964793] passed: 507: malloc (507) stress-ng: info: [964793] failed: 0 stress-ng: info: [964793] metrics untrustworthy: 0 stress-ng: info: [964793] successful run completed in 2 hours, 31 mins, 54.52 secs
stress_ng_test.txt Initial Stress-ng memory test run
stress_ng_test-4.txt Stress-ng memory test run with increased base timeout
Which version of stress-ng is being used? Use stress-ng -V to show the version.
Here is the package version from the submission file. I am waiting on the output of that command. stress-ng 0.18.01-0~202407131132~ubuntu22.04.1
I'd recommend using the latest version, I've fixed few bugs with vm size measuring in the last 6 months. I've got more recent versions in my PPA: ppa:colin-king/stress-ng
see https://launchpad.net/~colin-king/+archive/ubuntu/stress-ng
The "malloc: failed to create counter lock. skipping stressor" message is due to the fact that there are many instances of this stressor and each one creates a counter lock. Older versions of stress-ng use a page per lock and this may fail to get allocated as create new stressor instances. The latest version of stress-ng creates a shared page for all the locks, so one has a max of 512 active locks as the upper limit (this itself is actually probably too low, I need to probably provide at least 4K of available concurrent locks).
I've pushed a fix to bump the number of concurrent locks to 2 x max number of stressor instances:
commit 95062984882b5fcec84e541e686222da9b6a20a6 (HEAD -> master, origin/master, origin/HEAD)
Author: Colin Ian King <colin.i.king@gmail.com>
Date: Thu Oct 3 18:50:02 2024 +0100
core-lock: increase number of concurrent locks to 2 * STRESS_PROCS_MAX
Thanks for the feedback, I will have them try the latest version.
Did using a newer version this resolve the issue?
With the new version this issue is still being seen.
stress-ng: Installed: 0.18.05-1~j0
25 Oct 05:45: Running stress-ng malloc stressor for 7222 seconds... ** stress-ng exited with code 3 stress-ng: info: [2351326] setting to a 2 hours, 22 secs run per stressor stress-ng: info: [2351326] dispatching hogs: 512 malloc stress-ng: info: [2352338] malloc: failed to create counter lock. skipping stressor stress-ng: info: [2352340] malloc: failed to create counter lock. skipping stressor stress-ng: info: [2352342] malloc: failed to create counter lock. skipping stressor stress-ng: info: [2352344] malloc: failed to create counter lock. skipping stressor stress-ng: info: [2352345] malloc: failed to create counter lock. skipping stressor stress-ng: warn: [2351326] malloc: [2352340] aborted early, out of system resources stress-ng: warn: [2351326] malloc: [2352342] aborted early, out of system resources stress-ng: warn: [2351326] malloc: [2352344] aborted early, out of system resources stress-ng: warn: [2351326] malloc: [2352345] aborted early, out of system resources stress-ng: info: [2351326] skipped: 4: malloc (4) stress-ng: info: [2351326] passed: 507: malloc (507) stress-ng: info: [2351326] failed: 0 stress-ng: info: [2351326] metrics untrustworthy: 0 stress-ng: info: [2351326] successful run completed in 2 hours, 22.25 secs
I'll be releasing V0.18.06 next week, this will incorporate the following fix that will fully address this issue:
commit 95062984882b5fcec84e541e686222da9b6a20a6
Author: Colin Ian King <colin.i.king@gmail.com>
Date: Thu Oct 3 18:50:02 2024 +0100
core-lock: increase number of concurrent locks to 2 * STRESS_PROCS_MAX
This has been fixed in stress-ng release V0.18.06
I have a system with a large amount of memory that is failing with the stress-ng memory test. It did pass when the amount of memory for the test was reduced. Typically we like stress the system with the maximum amount available. Once the memory was added back in, the same failures occurred. We tried increasing the base timeout and that passed all of the stressors except malloc. I am unsure if this is an actual bug where the system resources cannot keep up or are we being too aggressive with the testcase.
CPU: AMD EPYC 9754 128-Core Processor (Bergamo) Mem: 881 GB 22.04.4 5.15 kernel
Steps to Reproduce sudo add-apt-repository ppa:checkbox-dev/stable sudo apt install canonical-certification-server sudo /usr/lib/checkbox-provider-base/bin/stress_ng_test.py memory