deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

When using cusolver, multithreading is slower than multiprocessing #4247

Open xuan112358 opened 6 months ago

xuan112358 commented 6 months ago

Details

I calculate a system with 32 water molecules using "ks_solver" as "cusolver". I use one GPU for calculation. I find that if I use multiprocessing, for example, running ABACUS by "OMP_NUMTHREADS=1 mpirun -n 12 abacus", the total time for 10 steps of MD is 1862s. However, if I use multithreading, for example, running ABACUS by "OMP_NUMTHREADS=12 mpirun -n 1 abacus", the total time for 10 steps of MD is 5920s. The latter is much lower! Examples and corresponding results are provided here.cusolver_mpi_openmp.zip

Task list for Issue attackers (only for developers)

WHUweiqingzhou commented 5 months ago

@dyzheng and @denghuilu could you have a look?