Error Log
input data, global 3D index [3,3,2], local index 14, rank 3 is (0.35251,-0.0147281)
input data, global 3D index [3,3,3], local index 15, rank 3 is (0.181726,0.826603)
[zhanghh-gpus-558b87ff9d-mncnf:02115] Process received signal
[zhanghh-gpus-558b87ff9d-mncnf:02115] Signal: Bus error (7)
[zhanghh-gpus-558b87ff9d-mncnf:02115] Signal code: Non-existant physical address (2)
[zhanghh-gpus-558b87ff9d-mncnf:02115] Failing at address: 0x7f7e08a7f000
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f7ffd78d420]
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18bb41)[0x7f7ffd3beb41]
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/2022/comm_libs/nccl/lib/libnccl.so.2(+0x618f3)[0x7f7e0b3368f3]
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/2022/comm_libs/nccl/lib/libnccl.so.2(+0x57d13)[0x7f7e0b32cd13]
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f7ffd781609]
[zhanghh-gpus-558b87ff9d-mncnf:02115] [ 5] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f7ffd352133]
[zhanghh-gpus-558b87ff9d-mncnf:02115] End of error message
[zhanghh-gpus-558b87ff9d-mncnf:02111] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[zhanghh-gpus-558b87ff9d-mncnf:02111] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
Error Log input data, global 3D index [3,3,2], local index 14, rank 3 is (0.35251,-0.0147281) input data, global 3D index [3,3,3], local index 15, rank 3 is (0.181726,0.826603) [zhanghh-gpus-558b87ff9d-mncnf:02115] Process received signal [zhanghh-gpus-558b87ff9d-mncnf:02115] Signal: Bus error (7) [zhanghh-gpus-558b87ff9d-mncnf:02115] Signal code: Non-existant physical address (2) [zhanghh-gpus-558b87ff9d-mncnf:02115] Failing at address: 0x7f7e08a7f000 [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f7ffd78d420] [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18bb41)[0x7f7ffd3beb41] [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/2022/comm_libs/nccl/lib/libnccl.so.2(+0x618f3)[0x7f7e0b3368f3] [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/2022/comm_libs/nccl/lib/libnccl.so.2(+0x57d13)[0x7f7e0b32cd13] [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f7ffd781609] [zhanghh-gpus-558b87ff9d-mncnf:02115] [ 5] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f7ffd352133] [zhanghh-gpus-558b87ff9d-mncnf:02115] End of error message [zhanghh-gpus-558b87ff9d-mncnf:02111] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics [zhanghh-gpus-558b87ff9d-mncnf:02111] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.