deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

Test: CI tests for HSE look weird #5452

Closed WHUweiqingzhou closed 2 weeks ago

WHUweiqingzhou commented 2 weeks ago

Describe the Testing Issue

Taking the latest CI test as an example, https://github.com/deepmodeling/abacus-develop/actions/runs/11756055443/job/32751795588

HSE test throw some MPI message, but still pass

1: [ RUN      ] 281_NO_KP_HSE
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:59227] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:59224] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:59226] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:59225] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[27256,1],3]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] HSE calculation on Si, multiple k-points, cal force and stress, exx real number
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [      OK  ]  totalforceref
1: [      OK  ]  totalstressref
1: [----------] Time elapsed: 24.316 seconds
1: 
1: [ RUN      ] 281_NO_KP_HSE_symmetry
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:61330] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:61331] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:61329] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:61332] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[25249,1],1]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] HSE calculation on Si, multiple k-points, cal force and stress, exx real number, symmetry=1
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [      OK  ]  totalforceref
1: [      OK  ]  totalstressref
1: [      OK  ]  pointgroupref
1: [      OK  ]  spacegroupref
1: [      OK  ]  nksibzref
1: [----------] Time elapsed: 36.156 seconds
1: 
1: [ RUN      ] 282_NO_KP_HSE_complex
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:63833] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:63835] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:63834] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:63836] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[29817,1],1]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] HSE calculation on Si, multiple k-points, exx complex number
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [----------] Time elapsed: 16.651 seconds
1: 
1: [ RUN      ] 283_NO_KP_HF
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64341] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64340] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[30307,1],2]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] Hartree-Fock calculation on Si, multiple k-points
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [----------] Time elapsed: 19.299 seconds
1: 
1: [ RUN      ] 284_NO_KP_PBE0
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64847] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64846] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64845] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:64848] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[28773,1],2]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] PBE0 calculation on Si, multiple k-points
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [----------] Time elapsed: 18.294 seconds
1: 
1: [ RUN      ] 285_NO_KP_RE_HSE
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:65352] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:65351] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:65353] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:65354] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[29295,1],1]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] HSE calculation on Si, multiple k-points, relax, exx real number
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [----------] Time elapsed: 26.320 seconds
1: 
1: [ RUN      ] 286_NO_KP_CR_HSE
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:67299] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:67298] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:67300] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: *** The MPI_Type_free() function was called after MPI_FINALIZE was invoked.
1: *** This is disallowed by the MPI standard.
1: *** Your MPI job will now abort.
1: [1eb9fe6388f0:67297] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
1: --------------------------------------------------------------------------
1: Primary job  terminated normally, but 1 process returned
1: a non-zero exit code. Per user-direction, the job has been aborted.
1: --------------------------------------------------------------------------
1: --------------------------------------------------------------------------
1: mpirun detected that one or more processes exited with non-zero status, thus causing
1: the job to be terminated. The first process to do so was:
1: 
1:   Process name: [[35824,1],1]
1:   Exit code:    1
1: --------------------------------------------------------------------------
1: [----------] HSE calculation on Si, multiple k-points, cell-relax, exx real number
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [----------] Time elapsed: 47.126 seconds

Additional Context

No response

Task list for Issue attackers (only for developers)