Open rolandschulz opened 7 years ago
@rolandschulz: thank you for using Archer on your build-and-test system.
Could you describe how those unit tests failed more specifically? One of the common cases of something like this in our environment is Archer/TSan detects an error (directly in a unit test code or some other tester components) and causes it to exit with a return code 66. https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags
Could you try to set EnvVar, TSAN_OPTIONS="exitcode=0" before you run these unit tests and see if this makes any difference?
All 3 failures are incorrect results not TSAN errors. They also all 3 still occur with OMP_NUM_THREADS=1. Thus it seems that the LLVM pass added by archer causes incorrect results. Compiling and run all unit tests with clang 4.0 with Tsan (without archer) all unit tests give the correct result (some tests produce false positive tsan warnings - different tests from the ones which fail with archer) E.g. running the first test which fails by itself gives
./bin/ewald-test --gtest_filter=SaneInput1/PmeBSplineModuliCorrectnessTest.ReproducesValues/0
Note: Google Test filter = SaneInput1/PmeBSplineModuliCorrectnessTest.ReproducesValues/0
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from SaneInput1/PmeBSplineModuliCorrectnessTest
[ RUN ] SaneInput1/PmeBSplineModuliCorrectnessTest.ReproducesValues/0
../src/testutils/refdata.cpp:900: Failure
In item: /X/Length
Actual: '-1782689792'
Reference: '64'
Google Test trace:
../src/gromacs/ewald/tests/pmebsplinetest.cpp:95: Testing B-spline moduli creation (plain) for PME order 3, grid size 64 32 64
../src/testutils/refdata.cpp:900: Failure
In item: /Y/Length
Actual: '-1782689792'
Reference: '32'
Google Test trace:
../src/gromacs/ewald/tests/pmebsplinetest.cpp:95: Testing B-spline moduli creation (plain) for PME order 3, grid size 64 32 64
../src/testutils/refdata.cpp:900: Failure
In item: /Z/Length
Actual: '-1782689792'
Reference: '64'
Google Test trace:
../src/gromacs/ewald/tests/pmebsplinetest.cpp:95: Testing B-spline moduli creation (plain) for PME order 3, grid size 64 32 64
[ FAILED ] SaneInput1/PmeBSplineModuliCorrectnessTest.ReproducesValues/0, where GetParam() = (12-byte object <40-00 00-00 20-00 00-00 40-00 00-00>, 3, 4-byte object <00-00 00-00>) (52 ms)
[----------] 1 test from SaneInput1/PmeBSplineModuliCorrectnessTest (55 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (56 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] SaneInput1/PmeBSplineModuliCorrectnessTest.ReproducesValues/0, where GetParam() = (12-byte object <40-00 00-00 20-00 00-00 40-00 00-00>, 3, 4-byte object <00-00 00-00>)
1 FAILED TEST
Thanks. Seems something that @simoatze should look into. @rolandschulz: how should we reproduce this failures?
I provided the git, cmake, and ninja commands in my first message. That should let you be able to reproduce the error. If you have difficulty reproducing I'm happy to help.
@rolandschulz thanks for reporting these issues. I'll look into it as soon as I can and get back to you.
With Archer (dc4e363) build with out of source with LLVM 4.0 and OMP-TR4. Running GROMACS unit tests:
Produces:
Compiling and running with clang 4.0 without archer (with or without using OMP-TR4) all tests pass. All unit tests also pass with VS2015, ICC 16&17, GCC 4.8-7.1 and a few other compilers we check less regularly. Thus it is highly unlikely that the unit tests failures are GROMACS source code problems.