SebWouters / CheMPS2

CheMPS2: a spin-adapted implementation of DMRG for ab initio quantum chemistry
GNU General Public License v2.0
68 stars 34 forks source link

Tests failing with openblas #70

Closed wpoely86 closed 5 years ago

wpoely86 commented 5 years ago

The tests don't run when compiling with GCC 7.3 and OpenBLAS 0.3.1. It never gets past printing the header.

If you put export OMP_NUM_THREADS=1, they work fine again. Must be a threading issue somewhere (or a bug in OpenBLAS).

backtrace after letting it run for a couple of minutes:

#4  0x00002aaaaab71af4 in CheMPS2::Davidson::DiagonalizeSmallMatrixAndCalcResidual() (this=0x7fffffff6760) at /work/wapoelma/CheMPS2/CheMPS2/Davidson.cpp:276
#5  0x00002aaaaab722e5 in CheMPS2::Davidson::FetchInstruction(double**) (this=0x7fffffff6760, pointers=0x45a160)
    at /work/wapoelma/CheMPS2/CheMPS2/Davidson.cpp:141
#6  0x00002aaaaabc9b97 in CheMPS2::Heff::SolveDAVIDSON_main(CheMPS2::Sobject*, CheMPS2::TensorL***, CheMPS2::TensorOperator****, CheMPS2::TensorOperator****, CheMPS2::TensorOperator****, CheMPS2::TensorOperator****, CheMPS2::TensorS0****, CheMPS2::TensorS1****, CheMPS2::TensorF0****, CheMPS2::TensorF1****, CheMPS2::TensorQ***, CheMPS2::TensorX**, int, double**) const (this=0x7fffffff6920, denS=0x561470, Ltensors=0x423590, Atensors=0x423a10, Btensors=0x423a60, 
    Ctensors=0x423ab0, Dtensors=<optimized out>, S0tensors=<optimized out>, S1tensors=<optimized out>, F0tensors=<optimized out>, F1tensors=<optimized out>, 
    Qtensors=<optimized out>, Xtensors=<optimized out>, nLower=<optimized out>, VeffTilde=<optimized out>) at /work/wapoelma/CheMPS2/CheMPS2/Heff.cpp:369
#7  0x00002aaaaab2853c in CheMPS2::DMRG::solve_site(int, double, double, int, bool, bool, bool) (this=this@entry=0x4236b0, index=index@entry=8, 
    dvdson_rtol=dvdson_rtol@entry=1.0000000000000001e-05, noise_level=noise_level@entry=0, virtual_dimension=virtual_dimension@entry=500, 
    am_i_master=am_i_master@entry=true, moving_right=moving_right@entry=false, change=change@entry=false) at /work/wapoelma/CheMPS2/CheMPS2/DMRG.cpp:435
#8  0x00002aaaaab28afd in CheMPS2::DMRG::sweepleft(bool, int, bool) (this=this@entry=0x4236b0, change=change@entry=false, instruction=instruction@entry=0, 
    am_i_master=am_i_master@entry=true) at /work/wapoelma/CheMPS2/CheMPS2/DMRG.cpp:368
#9  0x00002aaaaab28da8 in CheMPS2::DMRG::Solve() (this=this@entry=0x4236b0) at /work/wapoelma/CheMPS2/CheMPS2/DMRG.cpp:293
#10 0x000000000040262a in main () at /work/wapoelma/CheMPS2/build/tests/tests/test1.cpp:98

If I let it run some more and look again, the backtrace is the same.

If I tried it with GCC+MKL or intel+MKL, it runs fine. So it might be a bug in OpenBLAS too...

SebWouters commented 5 years ago

@wpoely86

Thanks for the detailed info!

Given that CheMPS2::Davidson::DiagonalizeSmallMatrixAndCalcResidual()

this seems rather strange to me.

Given https://www.google.be/search?q=openblas+deadlock+OMP_NUM_THREADS and e.g.

it seems that it wouldn't be unlikely to be an OpenBLAS error...

S.

SebWouters commented 5 years ago

@hungpham2017

I think this might also concern your issue #69. Can you check:

ldd chemps2
ldd libchemps2.so

whether OpenBLAS is used? I saw in your anaconda list (https://github.com/SebWouters/CheMPS2/issues/69#issuecomment-443736090) that openblas 0.3.3 is mentioned.

Thanks! S.

hungpham2017 commented 5 years ago

@SebWouters that's true I have similar problem that I used both OpenBLAS and MKL in anaconda. After I uninstalled OpenBLAS and reinstalled everything. it worked fine with MKL.

ldd libchemps2.so

        linux-vdso.so.1 =>  (0x00007ffcf17e4000)
        libmkl_rt.so => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libmkl_rt.so (0x00007f2018b56000)
        libhdf5.so.101 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libhdf5.so.101 (0x00007f20185c2000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f201837b000)
        libstdc++.so.6 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libstdc++.so.6 (0x00007f201823a000)
        libiomp5.so => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libiomp5.so (0x00007f2017e51000)
        libgcc_s.so.1 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/././libgcc_s.so.1 (0x00007f2017c3b000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f20178a7000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f20176a2000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f201749a000)
        libz.so.1 => /panfs/roc/groups/6/gagliard/phamx494/anaconda/lib/./././libz.so.1 (0x00007f2017283000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f2016ffe000)
        /lib64/ld-linux-x86-64.so.2 (0x00005614d9eda000)