This is the stress-ng upstream project git repository. stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
Hi Colin,
while testing the V0.15.02 update, I noticed that the pci stressor would fail with Bionic (4.15.0-202-generic), Focal (5.4.0-138.155), Jammy (5.15.0-59.65) and Kinetic (5.19.0-30.31) on ARM64 / PowerPC
Test log on a Cavium ThunderX2 ARM64 box with B-4.15:
$ sudo ./stress-ng -v -t 5 --pci 4 --pci-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable
stress-ng: debug: [36741] invoked with './stress-ng -v -t 5 --pci 4 --pci-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
stress-ng: debug: [36741] stress-ng 0.15.02 g91ec6bccd7e9
stress-ng: debug: [36741] system: Linux helo-kernel 4.15.0-202-generic #213-Ubuntu SMP Thu Jan 5 19:22:02 UTC 2023 aarch64
stress-ng: debug: [36741] RAM total: 251.8G, RAM free: 249.2G, swap free: 8.0G
stress-ng: debug: [36741] temporary file path: '.', filesystem type: ext2
stress-ng: debug: [36741] 224 processors online, 224 processors configured
stress-ng: info: [36741] setting to a 5 second run per stressor
stress-ng: info: [36741] dispatching hogs: 4 pci
stress-ng: debug: [36741] cache allocate: shared cache buffer size: 32768K
stress-ng: debug: [36741] starting stressors
stress-ng: debug: [36741] 4 stressors started
stress-ng: debug: [36742] pci: started [36742] (instance 0)
stress-ng: debug: [36744] pci: started [36744] (instance 2)
stress-ng: debug: [36745] pci: started [36745] (instance 3)
stress-ng: debug: [36743] pci: started [36743] (instance 1)
stress-ng: debug: [36741] process [36742] (pci) terminated on signal: 11 (Segmentation fault)
stress-ng: debug: [36741] process [36742] terminated
stress-ng: debug: [36744] pci: exited [36744] (instance 2)
stress-ng: debug: [36743] pci: exited [36743] (instance 1)
stress-ng: debug: [36741] process [36743] terminated
stress-ng: debug: [36741] process [36744] terminated
stress-ng: debug: [36741] process [36745] (pci) terminated on signal: 11 (Segmentation fault)
stress-ng: debug: [36741] process [36745] terminated
stress-ng: fail: [36741] pci instance 0 corrupted bogo-ops counter, 102 vs 0
stress-ng: fail: [36741] pci instance 0 hash error in bogo-ops counter and run flag, 1605818725 vs 0
stress-ng: fail: [36741] pci instance 3 corrupted bogo-ops counter, 20 vs 0
stress-ng: fail: [36741] pci instance 3 hash error in bogo-ops counter and run flag, 2612107102 vs 0
stress-ng: fail: [36741] metrics-check: stressor metrics corrupted, data is compromised
info: 5 failures reached, aborting stress process
stress-ng: info: [36741] unsuccessful run completed in 1.46s
e1ebe935ecd438ac0a04a61a8e761598544f341d is the first bad commit
commit e1ebe935ecd438ac0a04a61a8e761598544f341d
Author: Colin Ian King <colin.i.king@gmail.com>
Date: Fri Jan 6 12:49:19 2023 +0000
stress-pci: print PCI config and resource space read rates
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
:100644 100644 2e54c0f5cae583c0eefce8d64c3458e81a833359 630b7c296dabffca06cd41dff1e3f31f58a5bf17 M stress-pci.c
Bisect log
$ git bisect log
git bisect start
# bad: [4164f6842c712c2d9a13619c3c70fd35d8d02cdb] Debian: update changelog
git bisect bad 4164f6842c712c2d9a13619c3c70fd35d8d02cdb
# good: [91ec6bccd7e94fd04674b29e081e579710b97d71] Debian: update changelog
git bisect good 91ec6bccd7e94fd04674b29e081e579710b97d71
# good: [16c45ed56baf02a85b97f785f5dc81c154ccfb04] stress-mmap: make variable mask non-clobberable
git bisect good 16c45ed56baf02a85b97f785f5dc81c154ccfb04
# good: [e49b6cf6cea5f55f817ca1e41c758263bb1a52a2] stress-jpeg: add compression ration metrics
git bisect good e49b6cf6cea5f55f817ca1e41c758263bb1a52a2
# bad: [697782d4fe07ed8a030a2f965e074e47279b07a1] stress-ng.h: divide by size of make allocator bitmap elements instead of 8
git bisect bad 697782d4fe07ed8a030a2f965e074e47279b07a1
# good: [fb940afa9e682d2a00d6c3bf8d6997e2febbee8a] core-hash: add more comments to explain the nuances of memcpy
git bisect good fb940afa9e682d2a00d6c3bf8d6997e2febbee8a
# bad: [bcf1ec7d7c53373077f495e173af7abeafc7d9c3] stress-list: use builtin shim_ror64
git bisect bad bcf1ec7d7c53373077f495e173af7abeafc7d9c3
# good: [ec8283a4b2d0b5429e7a120564a71a8dc9d9b46a] stress-forkheavy: remove sleep and send SIGALRM to pids on termination
git bisect good ec8283a4b2d0b5429e7a120564a71a8dc9d9b46a
# bad: [45337e787786f9af8dc79eb3488bc8789e082357] core-helper: fix stack size for non-first calls (incorrect cached value)
git bisect bad 45337e787786f9af8dc79eb3488bc8789e082357
# bad: [e1ebe935ecd438ac0a04a61a8e761598544f341d] stress-pci: print PCI config and resource space read rates
git bisect bad e1ebe935ecd438ac0a04a61a8e761598544f341d
# first bad commit: [e1ebe935ecd438ac0a04a61a8e761598544f341d] stress-pci: print PCI config and resource space read rates
On another two Lenovo ThinkSystem HR330a ARM64 nodes and our Power8 node, the ssh session will hang, here is the dmesg output from console (sorry for the malformed output, copied-pasted as-is):
Hi Colin, while testing the V0.15.02 update, I noticed that the pci stressor would fail with Bionic (4.15.0-202-generic), Focal (5.4.0-138.155), Jammy (5.15.0-59.65) and Kinetic (5.19.0-30.31) on ARM64 / PowerPC
Test log on a Cavium ThunderX2 ARM64 box with B-4.15:
dmesg output:
Bisect suggest this commit might be the cause:
Bisect log
On another two Lenovo ThinkSystem HR330a ARM64 nodes and our Power8 node, the ssh session will hang, here is the dmesg output from console (sorry for the malformed output, copied-pasted as-is):
After this, the system will keep complaining the following when you try to run any command over the console and requires a hard-reset: