deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
164 stars 128 forks source link

DCU error: psi::memory::cast_memory<double, double>(std::complex<double>*, std::complex<double> const*, int) #4092

Closed pxlxingliang closed 3 months ago

pxlxingliang commented 4 months ago

Describe the bug

The daily test of DCU at 20240506, 003_12Pt111 has the below error:

Invalid address access: 0x4b39aa606000, Error code: 1.

>>>>>>>> KERNEL VMFault !!!! <<<<<<

>>>>>>>> PID: 4584 !!!! <<<<<<
=========> STREAM <0x2632300>: VMFault HSA QUEUE ANALYSIS <=========
STREAM <0x2632300>: get hsa queue W/R ptr: write index: 0, read index: 0
STREAM <0x2632300>: FAILED: hsa queue is null!
=========> STREAM <0x2596a90>: VMFault HSA QUEUE ANALYSIS <=========
STREAM <0x2596a90>: get hsa queue W/R ptr: write index: 0, read index: 0
STREAM <0x2596a90>: FAILED: hsa queue is null!
=========> STREAM <0x24d3ae0>: VMFault HSA QUEUE ANALYSIS <=========
STREAM <0x24d3ae0>: get hsa queue W/R ptr: write index: 0, read index: 0
STREAM <0x24d3ae0>: FAILED: hsa queue is null!
=========> STREAM <0x26cdb70>: VMFault HSA QUEUE ANALYSIS <=========
STREAM <0x26cdb70>: get hsa queue W/R ptr: write index: 2, read index: 0
STREAM <0x26cdb70>: >>>>>>>> DUMP KERNEL AQL PACKET <<<<<<<<<
STREAM <0x26cdb70>: header: 770
STREAM <0x26cdb70>: setup: 3
STREAM <0x26cdb70>: workgroup: x:256, y:1, z:1
STREAM <0x26cdb70>: grid: x:8323840, y:1, z:1
STREAM <0x26cdb70>: group_segment_size: 0
STREAM <0x26cdb70>: private_segment_size: 0
STREAM <0x26cdb70>: kernel_object: 47503591250688

SUCCESS: FIND SAME KERNEL OBJECT COMMAND IN USE LIST. useIdx: 0
STREAM <0x26cdb70>: >>>>>>>> FIND MATCH KERNEL COMMAND <<<<<<<<<
STREAM <0x26cdb70>: kernel name: _ZN3psi6memory11cast_memoryIddEEvPSt7complexIT_EPKS2_IT0_Ei
STREAM <0x26cdb70>: >>>>>>>> DUMP KERNEL ARGS: size: 20 <<<<<<<<<

00 00 40 a2 39 2b 00 00 00 00 60 aa 39 2b 00 00 
0c 02 7f 00 

STREAM <0x26cdb70>: >>>>>>>> DUMP KERNEL ARGS PTR INFO <<<<<<<<<
STREAM <0x26cdb70>: ptr arg index: 0, ptr: 0x2b39a2400000
STREAM <0x26cdb70>: host ptr: 0x2b39a2400000, device ptr: 0x2b39a2400000, unaligned ptr: 0x2b39a2400000
STREAM <0x26cdb70>: size byte: 133177536
STREAM <0x26cdb70>: ptr arg index: 1, ptr: 0x2b39aa600000
STREAM <0x26cdb70>: host ptr: 0x2b39aa600000, device ptr: 0x2b39aa600000, unaligned ptr: 0x2b39aa600000
STREAM <0x26cdb70>: size byte: 133177536

>>>>>>>> KERNEL VMFault Analysis END !!!! <<<<<<

[b03r3n11:04584] *** Process received signal ***
[b03r3n11:04584] Signal: Aborted (6)
[b03r3n11:04584] Signal code:  (-6)
[b03r3n11:04584] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2b33dddc05d0]
[b03r3n11:04584] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b33e6e58207]
[b03r3n11:04584] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b33e6e598f8]
[b03r3n11:04584] [ 3] /public/software/compiler/rocm/dtk-22.10/lib/libgalaxyhip.so.5(+0x98e7d4)[0x2b33deb5f7d4]
[b03r3n11:04584] [ 4] /public/software/compiler/rocm/dtk-22.10/lib/libgalaxyhip.so.5(+0x98d0fe)[0x2b33deb5e0fe]
[b03r3n11:04584] [ 5] /public/software/compiler/rocm/dtk-22.10/lib/libgalaxyhip.so.5(+0x952086)[0x2b33deb23086]
[b03r3n11:04584] [ 6] /lib64/libpthread.so.0(+0x7dd5)[0x2b33dddb8dd5]
[b03r3n11:04584] [ 7] /lib64/libc.so.6(clone+0x6d)[0x2b33e6f1fead]
[b03r3n11:04584] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 4584 on node b03r3n11 exited on signal 6 (Aborted).

Execute c++filt _ZN3psi6memory11cast_memoryIddEEvPSt7complexIT_EPKS2_IT0_Ei,get:

void psi::memory::cast_memory<double, double>(std::complex<double>*, std::complex<double> const*, int)

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

pxlxingliang commented 4 months ago

The outputs: 003.zip

mohanchen commented 4 months ago

For every issue, try to add your comments and suggestions. @pxlxingliang

pxlxingliang commented 4 months ago

For every issue, try to add your comments and suggestions. @pxlxingliang

This case is normal at CPU intel/gnu. I suspect to be a DCU related issue.

denghuilu commented 4 months ago

Can not be reproduced, here's my environment:

[aisi@b01r4n18:003_12Pt111-new]$ module list 
Currently Loaded Modulefiles:
  1) compiler/devtoolset/7.3.1   3) mpi/hpcx/2.11.0/gcc-7.3.1
  2) compiler/cmake/3.23.3       4) compiler/rocm/dtk-22.10

And the rerun log with the same commit of this issues:

[aisi@b01r4n18:003_12Pt111-new]$ mpirun -n 4 ../../abacus-develop/build-dtk-22.10/abacus_pw 
WARNING: Total thread number on this node mismatches with hardware availability. This may cause poor performance.
Info: Local MPI proc number: 4,OpenMP thread number: 1,Total thread number: 4,Local thread limit: 32

                              ABACUS v3.6.2

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: 48f2b5d (Sun May 5 16:28:08 2024 +0800)

 Mon May  6 19:06:25 2024
 MAKE THE DIR         : OUT.ABACUS/
 RUNNING WITH DEVICE  : GPU / Device 66a1

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 10 for Pt: [Xe] 4f14 5d9 6s1
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM        : 180 * 48 * 48
 UNIFORM GRID DIM(BIG)   : 180 * 48 * 48
 DONE(0.442487   SEC) : SETUP UNITCELL
 WARNING: PRICELL: NCELL != NTRANS !
 NCELL=2, NTRANS=3
 Suggest solution: Use a larger `symmetry_prec`. 
 Now regard the structure as a primitive cell.
 DONE(0.561984   SEC) : SYMMETRY
 DONE(0.74339    SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  
 1       13              4           
 ---------------------------------------------------------
 Use plane wave basis
 ---------------------------------------------------------
 ELEMENT NATOM       XC          
 Pt      12          
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.760976   SEC) : INIT PLANEWAVE
 MEMORY FOR PSI (MB)  : 169.462
 DONE(1.11987    SEC) : LOCAL POTENTIAL
 DONE(1.3147     SEC) : NON-LOCAL POTENTIAL
 DONE(1.34051    SEC) : INIT BASIS
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(2.17696    SEC) : INIT SCF
 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 CG1    -3.960642e+04  0.000000e+00   1.808e-01  3.622e+01  
 CG2    -3.960660e+04  -1.796480e-01  9.177e-02  3.076e+00  
 CG3    -3.960650e+04  9.300235e-02   4.238e-02  3.084e+00  
 CG4    -3.960655e+04  -4.136505e-02  8.941e-03  3.680e+00  
 CG5    -3.960656e+04  -1.343822e-02  5.719e-03  3.507e+00  
 CG6    -3.960656e+04  3.670269e-03   8.089e-04  3.173e+00  
 CG7    -3.960655e+04  3.020979e-03   3.916e-04  3.680e+00  
 CG8    -3.960655e+04  -8.765871e-04  4.125e-05  3.654e+00  
 CG9    -3.960655e+04  9.874788e-04   8.185e-05  3.673e+00  
 CG10   -3.960655e+04  2.818119e-04   7.384e-06  3.688e+00  
 CG11   -3.960655e+04  1.151183e-04   1.501e-06  3.151e+00  
 CG12   -3.960655e+04  1.684024e-04   3.574e-07  3.289e+00  
 CG13   -3.960655e+04  1.589673e-04   4.861e-08  3.375e+00  
----------------------------------------------------------------
TOTAL-STRESS (KBAR)                                           
----------------------------------------------------------------
       16.7740699215         0.0000000000        -1.4340236664
        0.0000000000       -36.9757732000         0.0000000000
       -1.4340236664         0.0000000000       -20.1983025429
----------------------------------------------------------------
 TOTAL-PRESSURE: -13.466669 KBAR

TIME STATISTICS
-------------------------------------------------------------------------------------
     CLASS_NAME                 NAME             TIME(Sec)  CALLS   AVG(Sec) PER(%)
-------------------------------------------------------------------------------------
                     total                        83.58          17   4.92   100.00
Driver               reading                       0.30           1   0.30     0.36
Input                Init                          0.11           1   0.11     0.13
Input_Conv           Convert                       0.18           1   0.18     0.22
Driver               driver_line                  83.28           1  83.28    99.64
UnitCell             check_tau                     0.00           1   0.00     0.00
PW_Basis_Sup         setuptransform                0.02           1   0.02     0.03
PW_Basis_Sup         distributeg                   0.00           1   0.00     0.01
mymath               heapsort                      0.03        1958   0.00     0.03
Symmetry             analy_sys                     0.00           1   0.00     0.00
PW_Basis_K           setuptransform                0.01           1   0.01     0.01
PW_Basis_K           distributeg                   0.00           1   0.00     0.00
PW_Basis             setup_struc_factor            0.11           1   0.11     0.13
ppcell_vnl           init                          0.05           1   0.05     0.05
ppcell_vl            init_vloc                     0.19           1   0.19     0.23
ppcell_vnl           init_vnl                      0.19           1   0.19     0.23
WF_atomic            init_at_1                     0.00           1   0.00     0.00
wavefunc             wfcinit                       0.00           1   0.00     0.00
Ions                 opt_ions                     82.21           1  82.21    98.37
ESolver_KS_PW        run                          78.20           1  78.20    93.56
H_Ewald_pw           compute_ewald                 0.01           1   0.01     0.02
Charge               set_rho_core                  0.00           1   0.00     0.00
Charge               atomic_rho                    0.23           1   0.23     0.28
PW_Basis_Sup         recip2real                    2.23         102   0.02     2.66
PW_Basis_Sup         gathers_scatterp              0.13         102   0.00     0.16
Potential            init_pot                      0.50           1   0.50     0.59
Potential            update_from_charge            7.30          14   0.52     8.74
Potential            cal_fixed_v                   0.02           1   0.02     0.03
PotLocal             cal_fixed_v                   0.02           1   0.02     0.03
Potential            cal_v_eff                     7.27          14   0.52     8.69
H_Hartree_pw         v_hartree                     0.68          14   0.05     0.81
PW_Basis_Sup         real2recip                    2.72         133   0.02     3.25
PW_Basis_Sup         gatherp_scatters              0.08         133   0.00     0.10
PotXC                cal_v_eff                     6.57          14   0.47     7.86
XC_Functional        v_xc                          6.57          14   0.47     7.86
Potential            interpolate_vrs               0.01          14   0.00     0.01
Symmetry             rhog_symmetry                 0.60          15   0.04     0.72
Symmetry             group fft grids               0.21          15   0.01     0.25
Charge_Mixing        init_mixing                   0.00           1   0.00     0.00
ESolver_KS_PW        hamilt2density               69.32          14   4.95    82.94
HSolverPW            solve                        68.09          14   4.86    81.46
Nonlocal             getvnl                        0.18          56   0.00     0.22
pp_cell_vnl          getvnl                        0.20          64   0.00     0.24
Structure_Factor     get_sk                        0.13         304   0.00     0.16
WF_atomic            atomic_wfc                    0.03           4   0.01     0.03
DiagoIterAssist      diagH_subspace_init           4.93           4   1.23     5.90
Operator             hPsi                         36.97       29896   0.00    44.24
Operator             EkineticPW                    2.20       29896   0.00     2.63
Operator             VeffPW                       20.90       29896   0.00    25.01
PW_Basis_K           recip_to_real gpu            11.45       43776   0.00    13.70
PW_Basis_K           real_to_recip gpu             9.51       36552   0.00    11.38
Operator             NonlocalPW                   13.72       29896   0.00    16.42
Nonlocal             add_nonlocal_pp               9.31       29896   0.00    11.13
DiagoIterAssist      diagH_LAPACK                  0.67          52   0.01     0.80
DiagoCG              diag_once                    52.27          56   0.93    62.54
DiagoCG_New          spsi_func                     5.97       59688   0.00     7.14
DiagoCG_New          hpsi_func                    29.43       29844   0.00    35.21
ElecStatePW          psiToRho                      2.17          14   0.16     2.60
Charge               rho_mpi                       0.02          14   0.00     0.03
Charge               reduce_diff_pools             0.02          14   0.00     0.03
Charge_Mixing        get_drho                      0.60          14   0.04     0.72
Charge_Mixing        inner_product_recip_rho       0.01          14   0.00     0.02
Charge               mix_rho                       0.45          12   0.04     0.54
Charge               Broyden_mixing                0.11          12   0.01     0.13
DiagoIterAssist      diagH_subspace                4.94          48   0.10     5.92
Charge_Mixing        inner_product_recip_hartree   0.10         120   0.00     0.12
Forces               cal_force_loc                 0.11           1   0.11     0.13
Forces               cal_force_ew                  0.09           1   0.09     0.11
Forces               cal_force_nl                  0.10           1   0.10     0.12
Forces               cal_force_cc                  0.00           1   0.00     0.00
Forces               cal_force_scc                 0.33           1   0.33     0.40
Stress_PW            cal_stress                    3.39           1   3.39     4.05
Stress_Func          stress_kin                    0.25           1   0.25     0.30
Stress_Func          stress_har                    0.03           1   0.03     0.03
Stress_Func          stress_ewa                    0.09           1   0.09     0.11
Stress_Func          stress_gga                    0.29           1   0.29     0.35
Stress_Func          stress_loc                    0.35           1   0.35     0.41
Stress_Func          stress_cc                     0.00           1   0.00     0.00
Stress_Func          stress_nl                     2.37           1   2.37     2.84
ModuleIO             write_istate_info             0.02           1   0.02     0.02
-------------------------------------------------------------------------------------

 START  Time  : Mon May  6 19:06:25 2024
 FINISH Time  : Mon May  6 19:07:48 2024
 TOTAL  Time  : 83
 SEE INFORMATION IN : OUT.ABACUS/
WHUweiqingzhou commented 3 months ago

This issue is from the machine issue, not related with ABACUS.