Closed wpoely86 closed 5 years ago
@wpoely86
Thanks for the detailed info!
Given that CheMPS2::Davidson::DiagonalizeSmallMatrixAndCalcResidual()
this seems rather strange to me.
Given https://www.google.be/search?q=openblas+deadlock+OMP_NUM_THREADS and e.g.
it seems that it wouldn't be unlikely to be an OpenBLAS error...
S.
@hungpham2017
I think this might also concern your issue #69. Can you check:
ldd chemps2
ldd libchemps2.so
whether OpenBLAS is used? I saw in your anaconda list (https://github.com/SebWouters/CheMPS2/issues/69#issuecomment-443736090) that openblas 0.3.3 is mentioned.
Thanks! S.
@SebWouters that's true I have similar problem that I used both OpenBLAS and MKL in anaconda. After I uninstalled OpenBLAS and reinstalled everything. it worked fine with MKL.
ldd libchemps2.so
linux-vdso.so.1 => (0x00007ffcf17e4000)
libmkl_rt.so => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libmkl_rt.so (0x00007f2018b56000)
libhdf5.so.101 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libhdf5.so.101 (0x00007f20185c2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f201837b000)
libstdc++.so.6 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libstdc++.so.6 (0x00007f201823a000)
libiomp5.so => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libiomp5.so (0x00007f2017e51000)
libgcc_s.so.1 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libgcc_s.so.1 (0x00007f2017c3b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f20178a7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f20176a2000)
librt.so.1 => /lib64/librt.so.1 (0x00007f201749a000)
libz.so.1 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/./././libz.so.1 (0x00007f2017283000)
libm.so.6 => /lib64/libm.so.6 (0x00007f2016ffe000)
/lib64/ld-linux-x86-64.so.2 (0x00005614d9eda000)
The tests don't run when compiling with GCC 7.3 and OpenBLAS 0.3.1. It never gets past printing the header.
If you put
export OMP_NUM_THREADS=1
, they work fine again. Must be a threading issue somewhere (or a bug in OpenBLAS).backtrace after letting it run for a couple of minutes:
If I let it run some more and look again, the backtrace is the same.
If I tried it with GCC+MKL or intel+MKL, it runs fine. So it might be a bug in OpenBLAS too...