SebWouters / CheMPS2

CheMPS2: a spin-adapted implementation of DMRG for ab initio quantum chemistry
GNU General Public License v2.0
70 stars 34 forks source link

A bug about openmp version #81

Closed zhaivanczha closed 2 years ago

zhaivanczha commented 2 years ago

I've tried using chemps2 with openmp parallel to run the tests example, it turned to be trapped in an endless loop. After i set the OMP_NUM_THREADS=1, it finished successfully. I think there are some bugs with the CheMPS2::Heff::makeHeff() function in the CheMPS2/Heff.cpp (after i checked), But i don't know how to fix it.

Best regards, Civan

SebWouters commented 2 years ago

Hi Civan,

I'm going to need more details: which test, which compiler, which OS, which libraries are linked, does it fail for all thread numbers > 1, ...

I've used clang version 10.0.0-4ubuntu1 on Ubuntu 20.04.3 LTS. The following libraries are linked (ldd libchemps2.so.3):

    linux-vdso.so.1 (0x00007fff2a7a2000)
    libmkl_intel_lp64.so => /opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00007fd757d5f000)
    libmkl_intel_thread.so => /opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00007fd75587f000)
    libmkl_core.so => /opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_core.so (0x00007fd751746000)
    libomp.so.5 => /usr/lib/x86_64-linux-gnu/libomp.so.5 (0x00007fd75162d000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd7514de000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd7514d8000)
    libhdf5_serial.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007fd75115b000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd751138000)
    libsz.so.2 => /usr/lib/x86_64-linux-gnu/libsz.so.2 (0x00007fd751133000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd751115000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd750f33000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd750f18000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd750d26000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd758a2d000)
    libaec.so.0 => /usr/lib/x86_64-linux-gnu/libaec.so.0 (0x00007fd750d1d000)

The tests all pass on my older laptop (with an Intel M-5Y71 CPU @ 1.20GHz × 4), using 4 threads. Test6 and test14 can take while though.

If you can let me know more details, I can try to reproduce the behavior.

Best regards, Sebastian

zhaivanczha commented 2 years ago

Hi Sebastian,

Thanks very much for your quick reply! My compilation environment are _gcc-10.2.0 x8664-pc-linux-gnu with CentOS Linux release 7.6.1810 (Core) on Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz. The linked libraries are shown as:

linux-vdso.so.1 =>  (0x00007ffc4c3cd000)
libmkl_intel_lp64.so => /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00002ae35a299000)
libmkl_intel_thread.so => /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00002ae35ae11000)
libmkl_core.so => /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/libmkl_core.so (0x00002ae35d2ed000)
libiomp5.so => /opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin/libiomp5.so (0x00002ae3615c2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ae3619ac000)
libm.so.6 => /lib64/libm.so.6 (0x00002ae361bb0000)
libhdf5.so.200 => /home/civan/Applications/hdf5-1.12.1-gcc10/lib/libhdf5.so.200 (0x00002ae361eb2000)
libstdc++.so.6 => /opt/gcc-10.2.0/build/lib64/libstdc++.so.6 (0x00002ae3624d4000)
libgomp.so.1 => /opt/gcc-10.2.0/build/lib64/libgomp.so.1 (0x00002ae3628a1000)
libgcc_s.so.1 => /opt/gcc-10.2.0/build/lib64/libgcc_s.so.1 (0x00002ae362adf000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae362cf7000)
libc.so.6 => /lib64/libc.so.6 (0x00002ae362f13000)
/lib64/ld-linux-x86-64.so.2 (0x00002ae359d0d000)

I've tried several tests(including test1, test2, test6, test14), they all failed for situations where OMP_NUM_THREADS is over 1. And strangely, i set OMP_NUM_THREADS=1 and MKL_NUM_THREADS=24, it can also finish successfully.

I hope i've given enough information for your next tests.

Best regards, Civan

wpoely86 commented 2 years ago

I've tried running the tests on a CentOS 7.9 also with 24 cores using GCC 10.3 + MKL 2021.2.0 (CPU is Xeon Gold 6148) => all tests pass without issue.

@zhaivanczha can you try with MKL_NUM_THREADS=1 ? Layered multithreading usually doesn't work very well.

zhaivanczha commented 2 years ago

Yes, i've tried MKL_NUM_THREADS=1 while OMP_NUM_THREADS not equal 1, it failed. I put some check print in the source code, it turns out to trap in this while loop at around 357 line in Heff.cpp file:

while ( instruction == 'B' ){
       #ifdef CHEMPS2_MPI_COMPILATION
      {
         int mpi_instruction = 2;
         MPIchemps2::broadcast_array_int( &mpi_instruction, 1, MPI_CHEMPS2_MASTER );
         MPIchemps2::broadcast_array_double( whichpointers[0], veclength, MPI_CHEMPS2_MASTER );
         makeHeff(whichpointers[0], workspace, denS, Ltensors, Atensors, Btensors, Ctensors, Dtensors, S0tensors, S1tensors, F0tensors, F1tensors, Qtensors, Xtensors, nLower, VeffTilde);
         MPIchemps2::reduce_array_double( workspace, whichpointers[1], veclength, MPI_CHEMPS2_MASTER );
      }
      #else
         makeHeff(whichpointers[0], whichpointers[1], denS, Ltensors, Atensors, Btensors, Ctensors, Dtensors, S0tensors, S1tensors, F0tensors, F1tensors, Qtensors, Xtensors, nLower, VeffTilde);
      #endif
      std::cout << whichpointers[0] << ' ' << whichpointers[1] << '\n';
      instruction = deBoskabouter.FetchInstruction( whichpointers );
      std::cout << "2" << instruction << '\n';
   }

i disabled the MPI and the version source code is just from the master branch(aaacf284). i wonder if it is a bug that caused by the openmp?

Best regards, Civan

SebWouters commented 2 years ago

i wonder if it is a bug that caused by the openmp?

I hope :-). But I also think so. Can you try with gcc one major version earlier, explicitly setting MKL=OFF, and explicitly providing other LAPACK/BLAS? E.g.

CXX=g++-9 CC=gcc-9 cmake .. -DMKL=OFF -DWITH_MPI=OFF -DLAPACK_LIBRARIES="/usr/lib/x86_64-linux-gnu/libblas.so;/usr/lib/x86_64-linux-gnu/liblapack.so"

ldd ../CheMPS2/libchemps2.so
linux-vdso.so.1 (0x00007ffede37b000)
libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007ff5a5ad7000)
liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007ff5a5433000)
libhdf5_serial.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007ff5a50b6000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff5a4ed2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff5a4d83000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007ff5a4d41000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff5a4d26000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff5a4b34000)
libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007ff5a486c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff5a4847000)
libsz.so.2 => /usr/lib/x86_64-linux-gnu/libsz.so.2 (0x00007ff5a4842000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff5a4826000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff5a4820000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff5a5cdc000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007ff5a47d6000)
libaec.so.0 => /usr/lib/x86_64-linux-gnu/libaec.so.0 (0x00007ff5a47cb000)

Gcc 10 supports OpenMP 5 and gcc 9 supports OpenMP 4.5, so there might have gone something wrong with that. I've also tried gcc/++-10 (gcc-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0), with which everything worked, but it's of course 10.3 and not 10.2.

If the most basic version (gcc 9, MKL=OFF, gnu lapack/blas) works, you can work your way up to your current toolchain one step at a time.

zhaivanczha commented 2 years ago

yes, it works! After i checked again, i think the problem is that the libiomp5.so conflicts with the libgomp.so.1 when using MKL and gcc compiler together. it is also ok when i use icpc to compile with MKL=ON. Thank you very much!