deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
173 stars 134 forks source link

Davidson GPU has bug #5497

Open Qianruipku opened 1 day ago

Qianruipku commented 1 day ago

Describe the bug

  1. Compile abacus with CUDA=ON, OPENMP=OFF, LIBXC=ON.
  2. Run a Davidson tests: 102_PW_DA_davidson_GPU
  3. Segmental Fault: Image Image

Expected behavior

No response

To Reproduce

No response

Environment

icpc (ICC) 2021.5.0 20211109 Ubuntu 20.04

Additional Context

No response

Task list for Issue attackers (only for developers)

haozhihan commented 21 hours ago

cmake -B build -DUSE_CUDA=1 -DUSE_OPENMP=0 -DENABLE_LIBXC=1

This is my compilation instruction, and this issue did not arise.

haozhihan commented 21 hours ago

                              ABACUS v3.8.2

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: 6bff8e321 (Fri Nov 15 10:01:27 2024 +0800)

 Fri Nov 15 13:47:27 2024
 MAKE THE DIR         : OUT.autotest/
 RUNNING WITH DEVICE  : GPU / NVIDIA GeForce RTX 3090

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 5 for As: [Ar] 3d10 4s2 4p3
 Warning: the number of valence electrons in pseudopotential > 3 for Ga: [Ar] 3d10 4s2 4p1
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM        : 32 * 32 * 32
 UNIFORM GRID DIM(BIG)   : 32 * 32 * 32
 DONE(1.46775    SEC) : SETUP UNITCELL
 DONE(1.49589    SEC) : SYMMETRY
 DONE(1.57838    SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  THREADS     
 1       8               1           1           
 ---------------------------------------------------------
 Use plane wave basis
 ---------------------------------------------------------
 ELEMENT NATOM       XC          
 As      1           
 Ga      1           
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(1.58023    SEC) : INIT PLANEWAVE
 DONE(1.59158    SEC) : LOCAL POTENTIAL
 DONE(1.62581    SEC) : NON-LOCAL POTENTIAL
 MEMORY FOR PSI (MB)  : 3.83203
 DONE(1.76999    SEC) : INIT BASIS
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(1.80606    SEC) : INIT SCF
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 DA1     -4.87155567e+03   0.00000000e+00   1.5580e+00   0.53
 DA2     -4.86901535e+03   2.54031665e+00   4.0050e-01   0.13
 DA3     -4.86972586e+03  -7.10510235e-01   1.2556e-02   0.17
 DA4     -4.86974360e+03  -1.77357234e-02   1.0589e-03   0.16
 DA5     -4.86974615e+03  -2.54987088e-03   1.1886e-04   0.17
 DA6     -4.86974691e+03  -7.63735975e-04   3.3564e-05   0.18
 DA7     -4.86974703e+03  -1.21183950e-04   2.8005e-06   0.23
 DA8     -4.86974705e+03  -1.82497557e-05   6.7053e-07   0.22
 DA9     -4.86974705e+03  -2.64658886e-06   4.9567e-08   0.25
----------------------------------------------------------------
 TOTAL-STRESS (KBAR)                                            
----------------------------------------------------------------
    -12196.0138690840       -99.1010198639      -142.3528746494 
       -99.1010198639    -12240.9153585924       -52.1054880732 
      -142.3528746494       -52.1054880732    -12217.4193518486 
----------------------------------------------------------------
 TOTAL-PRESSURE: -12218.116193 KBAR

TIME STATISTICS
-----------------------------------------------------------------------------
    CLASS_NAME                NAME             TIME/s  CALLS   AVG/s  PER/%  
-----------------------------------------------------------------------------
                   total                       3.99   17       0.23   100.00 
 Driver            reading                     0.02   1        0.02   0.42   
 Input_Conv        Convert                     0.00   1        0.00   0.01   
 Driver            driver_line                 3.97   1        3.97   99.58  
 UnitCell          check_tau                   0.00   1        0.00   0.00   
 PW_Basis_Sup      setuptransform              0.04   1        0.04   0.99   
 PW_Basis_Sup      distributeg                 0.01   1        0.01   0.23   
 mymath            heapsort                    0.00   5        0.00   0.03   
 Symmetry          analy_sys                   0.03   1        0.03   0.71   
 PW_Basis_K        setuptransform              0.00   1        0.00   0.03   
 PW_Basis_K        distributeg                 0.00   1        0.00   0.00   
 PW_Basis          setup_struc_factor          0.00   1        0.00   0.02   
 ppcell_vnl        init                        0.00   1        0.00   0.06   
 ppcell_vl         init_vloc                   0.01   1        0.01   0.17   
 ppcell_vnl        init_vnl                    0.03   1        0.03   0.86   
 WF_atomic         init_at_1                   0.14   1        0.14   3.50   
 wavefunc          wfcinit                     0.00   1        0.00   0.00   
 Ions              opt_ions                    2.21   1        2.21   55.49  
 ESolver_KS_PW     runner                      2.09   1        2.09   52.50  
 ESolver_KS_PW     before_scf                  0.04   1        0.04   0.89   
 H_Ewald_pw        compute_ewald               0.00   1        0.00   0.01   
 Charge            set_rho_core                0.01   1        0.01   0.20   
 PW_Basis_Sup      recip2real                  0.04   83       0.00   1.08   
 PW_Basis_Sup      gathers_scatterp            0.00   83       0.00   0.09   
 Charge            atomic_rho                  0.02   2        0.01   0.38   
 Potential         init_pot                    0.02   1        0.02   0.38   
 Potential         update_from_charge          0.15   10       0.01   3.69   
 Potential         cal_fixed_v                 0.00   1        0.00   0.02   
 PotLocal          cal_fixed_v                 0.00   1        0.00   0.01   
 Potential         cal_v_eff                   0.15   10       0.01   3.65   
 H_Hartree_pw      v_hartree                   0.01   10       0.00   0.27   
 PW_Basis_Sup      real2recip                  0.05   107      0.00   1.16   
 PW_Basis_Sup      gatherp_scatters            0.00   107      0.00   0.05   
 PotXC             cal_v_eff                   0.13   10       0.01   3.37   
 XC_Functional     v_xc                        0.16   12       0.01   4.06   
 Potential         interpolate_vrs             0.00   10       0.00   0.01   
 Symmetry          rhog_symmetry               0.02   10       0.00   0.54   
 Symmetry          group fft grids             0.01   10       0.00   0.21   
 Charge_Mixing     init_mixing                 0.00   1        0.00   0.00   
 ESolver_KS_PW     hamilt2density_single       1.89   9        0.21   47.41  
 HSolverPW         solve                       1.86   9        0.21   46.66  
 Nonlocal          getvnl                      0.04   72       0.00   1.07   
 pp_cell_vnl       getvnl                      0.04   72       0.00   1.07   
 Structure_Factor  get_sk                      0.00   104      0.00   0.11   
 WF_atomic         atomic_wfc                  0.01   8        0.00   0.13   
 DiagoDavid        diag_once                   1.71   72       0.02   42.84  
 DiagoDavid        first                       0.32   72       0.00   8.01   
 David             spsi_func                   0.03   9962     0.00   0.81   
 DiagoDavid        SchmidtOrth                 0.38   4981     0.00   9.50   
 David             hpsi_func                   0.15   303      0.00   3.71   
 Operator          hPsi                        0.15   303      0.00   3.70   
 Operator          EkineticPW                  0.00   303      0.00   0.02   
 Operator          VeffPW                      0.14   303      0.00   3.49   
 PW_Basis_K        recip_to_real gpu           0.09   6709     0.00   2.26   
 PW_Basis_K        real_to_recip gpu           0.06   4981     0.00   1.51   
 Operator          NonlocalPW                  0.01   303      0.00   0.18   
 Nonlocal          add_nonlocal_pp             0.00   303      0.00   0.10   
 DiagoDavid        cal_elem                    0.00   303      0.00   0.08   
 DiagoDavid        diag_zhegvx                 0.91   303      0.00   22.93  
 DiagoDavid        cal_grad                    0.60   231      0.00   15.16  
 DiagoDavid        check_update                0.00   231      0.00   0.00   
 DiagoDavid        last                        0.01   80       0.00   0.15   
 DiagoDavid        refresh                     0.00   8        0.00   0.12   
 ElecStatePW       psiToRho                    0.05   9        0.01   1.35   
 Charge_Mixing     get_drho                    0.01   9        0.00   0.20   
 Charge_Mixing     inner_product_recip_rho     0.00   9        0.00   0.01   
 Charge            mix_rho                     0.01   8        0.00   0.22   
 Charge            Broyden_mixing              0.00   8        0.00   0.05   
 Charge_Mixing     inner_product_recip_hartree 0.00   56       0.00   0.03   
 ESolver_KS_PW     after_scf                   0.01   1        0.01   0.37   
 ModuleIO          write_rhog                  0.00   1        0.00   0.04   
 Forces            cal_force                   0.04   1        0.04   0.89   
 Forces            cal_force_loc               0.00   1        0.00   0.02   
 Forces            cal_force_ew                0.00   1        0.00   0.01   
 Forces            cal_force_nl                0.01   1        0.01   0.17   
 FS_Nonlocal_tools cal_becp                    0.01   8        0.00   0.34   
 Forces            cal_force_cc                0.02   1        0.02   0.56   
 Forces            cal_force_scc               0.00   1        0.00   0.13   
 Stress_PW         cal_stress                  0.08   1        0.08   2.10   
 Stress_Func       stress_kin                  0.02   1        0.02   0.46   
 Stress_Func       stress_har                  0.00   1        0.00   0.02   
 Stress_Func       stress_ewa                  0.00   1        0.00   0.02   
 Stress_Func       stress_gga                  0.01   1        0.01   0.22   
 Stress_Func       stress_loc                  0.01   1        0.01   0.21   
 Stress_Func       stress_cc                   0.03   1        0.03   0.67   
 Stress_Func       stress_nl                   0.02   1        0.02   0.50   
 ModuleIO          write_istate_info           0.00   1        0.00   0.02   
-----------------------------------------------------------------------------

 START  Time  : Fri Nov 15 13:47:27 2024
 FINISH Time  : Fri Nov 15 13:47:31 2024
 TOTAL  Time  : 4
 SEE INFORMATION IN : OUT.autotest/
Cstandardlib commented 20 hours ago

I did not reproduce the segfault on my machine.

                              ABACUS v3.8.2

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: 6bff8e321 (Fri Nov 15 10:01:27 2024 +0800)

 Fri Nov 15 14:55:18 2024
 MAKE THE DIR         : OUT.autotest/
 RUNNING WITH DEVICE  : GPU / NVIDIA GeForce RTX 3090
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(0.712072   SEC) : INIT SCF
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 DA1     -4.87155567e+03   0.00000000e+00   1.5580e+00   0.71
 DA2     -4.86901535e+03   2.54031665e+00   4.0050e-01   0.17
 DA3     -4.86972586e+03  -7.10510235e-01   1.2556e-02   0.20
 DA4     -4.86974360e+03  -1.77357234e-02   1.0589e-03   0.20
 DA5     -4.86974615e+03  -2.54987088e-03   1.1886e-04   0.22
 DA6     -4.86974691e+03  -7.63735974e-04   3.3564e-05   0.21
 DA7     -4.86974703e+03  -1.21183949e-04   2.8005e-06   0.28
 DA8     -4.86974705e+03  -1.82497580e-05   6.7053e-07   0.26
 DA9     -4.86974705e+03  -2.64658809e-06   4.9567e-08   0.29
----------------------------------------------------------------

with

Cstandardlib commented 18 hours ago

This bug is only triggered when TESTING is OFF. Both gcc and icpc version will meet segfault.