Closed hosiet closed 1 month ago
Hi @hosiet - that test is a pretty large scale one. I wonder if it is running the builder out of memory or just running slow and exceeding some timeouts. It also tries to raise the soft max open file limit to 2048 which could be unsuccessful depending no the hard limit in the test environment.
Our test suite probably needs some work to capture more detail on failure. What I would do manually is run the test directly with a verbose option, e.g.
$ cd t
$ ./t0039-llnl-el-capitan-cluster.t -v
But anyway, we could disable that test by default and selectively enable it in CI. Could you try with this patch?
diff --git a/t/t0039-llnl-el-capitan-cluster.t b/t/t0039-llnl-el-capitan-cluster.t
index cf0d780..c572547 100755
--- a/t/t0039-llnl-el-capitan-cluster.t
+++ b/t/t0039-llnl-el-capitan-cluster.t
@@ -4,6 +4,12 @@ test_description='Check LLNL El Capitan config'
. `dirname $0`/sharness.sh
+test -n "$TEST_LONG" && test_set_prereq LONGTEST
+if ! test_have_prereq LONGTEST; then
+ skip_all='skipping large scale El Capitan test'
+ test_done
+fi
+
ulimit -n 2048
powermand=$SHARNESS_BUILD_DIRECTORY/src/powerman/powermand
@hosiet - please let us know if the just-merged fix doesn't resolve this.
Thanks for the patch that disables this certain test in post-build tests. The build is now OK as shown on https://buildd.debian.org/status/package.php?p=powerman .
Probably we can run that certain problematic test in CI rather than post-build test. Debian has such CI infrastructure, and I can try to see if having that test executed in the CI with verbose option enabled could obtain more useful debugging info.
I just tried that test on a raspberry pi 4 with 2GB RAM running raspbian 12 and got an oom kill:
[35622.739829] Out of memory: Killed process 10905 (powermand) total-vm:557444kB, anon-rss:497408kB, file-rss:1792kB, shmem-rss:0kB, UID:5588 pgtables:1116kB oom_score_adj:0
It worked (slowly) when I re-ran it on a pi 4 with 4GB RAM.
My guess is that is the problem and you probably don't need to take it further. The test doesn't cover unique functionality; it is just a scaling test.
Looking at the following pages:
I am not sure why there are some constant test failures on riscv. Related logs:
Personally I am not an expert in riscv or powerman, so any suggestion or hints are appreciated.