Open mgorny opened 1 year ago
I am able to reproduce segfaults even with a mere ctest
, without -j12
:
$ ctest
Test project /my/path/c-blosc2/build
[...]
Start 1736: b2nd_example_serialize
1736/1736 Test #1736: b2nd_example_serialize .................................... Passed 0.00 sec
99% tests passed, 1 tests failed out of 1736
Label Time Summary:
b2nd = 0.50 sec*proc (8 tests)
Total Test time (real) = 53.04 sec
The following tests FAILED:
1703 - test_lz4_bitshuffle_n (SEGFAULT)
Errors while running CTest
Output from these tests are in: /my/path/c-blosc2/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
$
$ ctest --rerun-failed --output-on-failure
Test project /my/path/c-blosc2/build
Start 1703: test_lz4_bitshuffle_n
1/1 Test #1703: test_lz4_bitshuffle_n ............ Passed 0.41 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 0.45 sec
$
As you can see, in my case, errors seem to differ between ctest
runs. Do tests fail consistently for you, or “randomly” as in my case?
Today we have fixed something that may have created this: https://github.com/Blosc/c-blosc2/commit/ca9d7c6f42e9c95d78b896ebd875bcf54b2affce
Could you give it another go?
I can still reproduce.
Sorry, I was not explicit enough; I meant without parallelism (just ctest
). For ctest -j12
this should require more work (although it is not a high priority).
I do not see segfaults without -j12
any more – but in that case segfaults were sporadic.
Still an issue with 2.7.1 and -j$N
with N>1
I'm seeing this too, c51d050dfa154411d776d84771fd74ca83bd232b and v2.9.1. Most of the time there are test failures, but occasionally segfualts. I didn't capture a coredump yet.
The following tests FAILED: 302 - test_copy (Failed) 311 - test_frame_offset (Failed) 726 - test_schunk_header (Failed) 1722 - test_example_frame_offset (Failed)
The following tests FAILED: 302 - test_copy (Failed) 308 - test_fill_special (Failed) 310 - test_frame_get_offsets (Failed) 311 - test_frame_offset (Failed) 1315 - test_example_frame_simple (Failed)
The following tests FAILED: 11 - test_b2nd_copy (Failed) 302 - test_copy (Failed)
…
The failure rate is 100% (i.e. at least one) on multiple machines.
Tests could be modified to be run in a debugger. To get GDB to automatically print a backtrace in case of a crash:
gdb --batch --ex run --ex bt --args ./myprogram "$@" > gdb-backtrace.txt 2>&1
The above runs GDB in batch mode (--batch
) and tells it to run the program (--ex run
) and print a backtrace (--ex bt
) if it crashes. The output is redirected to a file called gdb-backtrace.txt
.
That said:
ctest
to run tests in the debugger as suggested above?
Describe the bug When I'm running the test suite with
ctest -j12
(i.e. 12 parallel jobs), I'm getting 2-3 different test failures in a run. Over a few runs, the following tests failed:Segfaults are especially concerning.
To Reproduce
Expected behavior Tests should pass when run in parallel.
Logs
LastTest.log
from the last run: LastTest.logSystem information: