Open dev-zero opened 3 years ago
reproducible with DBCSR itself, configured with: cmake -DTEST_MPI_RANKS=1 -DTEST_OMP_THREADS=72 ..
:
$ make test
Running tests...
Test project /data/tiziano/cp2k/exts/dbcsr/build
Start 1: dbcsr_perf:inputs/test_H2O.perf
1/17 Test #1: dbcsr_perf:inputs/test_H2O.perf ....................... Passed 72.55 sec
Start 2: dbcsr_perf:inputs/test_rect1_dense.perf
2/17 Test #2: dbcsr_perf:inputs/test_rect1_dense.perf ............... Passed 2.56 sec
Start 3: dbcsr_perf:inputs/test_rect1_sparse.perf
3/17 Test #3: dbcsr_perf:inputs/test_rect1_sparse.perf .............. Passed 10.91 sec
Start 4: dbcsr_perf:inputs/test_rect2_dense.perf
4/17 Test #4: dbcsr_perf:inputs/test_rect2_dense.perf ............... Passed 2.49 sec
Start 5: dbcsr_perf:inputs/test_rect2_sparse.perf
5/17 Test #5: dbcsr_perf:inputs/test_rect2_sparse.perf .............. Passed 10.36 sec
Start 6: dbcsr_perf:inputs/test_singleblock.perf
6/17 Test #6: dbcsr_perf:inputs/test_singleblock.perf ............... Passed 0.85 sec
Start 7: dbcsr_perf:inputs/test_square_dense.perf
7/17 Test #7: dbcsr_perf:inputs/test_square_dense.perf .............. Passed 1.09 sec
Start 8: dbcsr_perf:inputs/test_square_sparse.perf
8/17 Test #8: dbcsr_perf:inputs/test_square_sparse.perf ............. Passed 3.45 sec
Start 9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
9/17 Test #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ... Passed 1.62 sec
Start 10: dbcsr_unittest1
10/17 Test #10: dbcsr_unittest1 ....................................... Passed 1372.54 sec
Start 11: dbcsr_unittest2
11/17 Test #11: dbcsr_unittest2 ....................................... Passed 236.76 sec
Start 12: dbcsr_unittest3
12/17 Test #12: dbcsr_unittest3 ....................................... Passed 308.31 sec
Start 13: dbcsr_unittest4
13/17 Test #13: dbcsr_unittest4 ....................................... Passed 0.89 sec
Start 14: dbcsr_tensor_unittest
14/17 Test #14: dbcsr_tensor_unittest .................................***Failed 4.51 sec
Start 15: dbcsr_tas_unittest
15/17 Test #15: dbcsr_tas_unittest .................................... Passed 3.59 sec
Start 16: dbcsr_test_csr_conversions
16/17 Test #16: dbcsr_test_csr_conversions ............................ Passed 10.47 sec
Start 17: dbcsr_tensor_test
17/17 Test #17: dbcsr_tensor_test ..................................... Passed 0.73 sec
94% tests passed, 1 tests failed out of 17
Total Test time (real) = 2043.68 sec
The following tests FAILED:
14 - dbcsr_tensor_unittest (Failed)
Errors while running CTest
make: *** [Makefile:124: test] Error 8
and Testing/Temporary/LastTest.log
shows for the relevant test:
[...]
--------------------------------------------------------------------------------
TAS MATRIX MULTIPLICATION DONE
--------------------------------------------------------------------------------
GLOBAL INFO OF (14|25)
block dimensions: 4 5 11 3
full dimensions: 25 32 83 28
process grid dimensions: 1 1 1 1
DISTRIBUTION OF (14|25)
Number of non-zero blocks: 26
Percentage of non-zero blocks: 3.94
Average number of blocks per CPU: 26
Maximum number of blocks per CPU: 26
Average number of matrix elements per CPU: 64680
Maximum number of matrix elements per CPU: 64680
*******************************************************************************
* ___ *
* / \ *
* [ABORT] *
* \___/ Number of threads has changed! *
* | *
* O/| *
* /| | *
* / \ dbcsr_iterator_operations.F:179 *
*******************************************************************************
===== Routine Calling Stack =====
4 dbcsr_iterator_start
3 dbcsr_filter_anytype
2 dbcsr_t_contract
1 dbcsr_t_total
[...]
it seems that the number of OMP threads gets capped to 64 at some point
Maybe this is caused by the NUM_THREADS=64
in install_openblas.sh?
Could be, should be easy to verify (ref-lapack, mkl, libsci). If it is indeed OpenBLAS, the question is what we should do. The DBCSR-only test above was with a system-provided OpenBLAS on an openSUSE-system.
We can:
Could now reproduce this on a new (Apple silicon) MacBook Air. The only way around was to set Sorry, this was actually related to NO building with FFTW3, see cp2k/cp2k#1315OMP_NUM_THREADS=1
.
Describe the bug
we get the following error message when running certain tests with a fresh
cp2k.sdbg
:To Reproduce Steps to reproduce the behavior:
make ARCH=local VERSION=sdbg
with the arch file from the toolchaincd tests/QS/regtest-ri-rpa-rse ; ../../../exe/local/cp2k.sdbg Cubic_RPA_RSE_H2.inp
Setting
OMP_NUM_THREADS=64
solves the issue.