MatMechLab / AsFem

Advanced Simulation kit based on Finite Element Method (AsFem)
https://matmechlab.github.io/AsFem
GNU General Public License v3.0
180 stars 53 forks source link

PETSc error in branch devel #75

Closed bbsy789 closed 2 years ago

bbsy789 commented 2 years ago
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[1]PETSC ERROR: No error traceback is available, the problem could be in the main program.
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.17.2, Jun 02, 2022
[1]PETSC ERROR: /thfs1/home/liujinmei/AsFem/bin/asfem on a  named cn537 by liujinmei Wed Oct 26 11:44:02 2022
[1]PETSC ERROR: [2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[2]PETSC ERROR: No error traceback is available, the problem could be in the main program.
[2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
bbsy789 commented 2 years ago

@yangbai90 I think this problem is about memory.

bbsy789 commented 2 years ago

I modify the CMakeLists.txt.Add some args-fsanitize=undefined,address and change other args-O2->-Og,-Werror->none to CMAKE_CXX_FLAGS. I use this command find / -name "libasan.so" to get the location of "libasan.so", then

LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/5/libasan.so ./asfem --version. The result following:

liujinmei@ln0:~/AsFem/bin$ LD_PRELOAD=/thfs1/home/liujinmei/software/gcc/gcc12.2/lib64/libasan.so ./asfem --version
Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(159):
MPID_Init(509).......:
MPIR_pmi_init(91)....: PMIX_Init returned -25
[ln0:800040:0:800040] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 800040) ====
 0  /usr/local/ucx/lib/libucs.so.0(ucs_debug_print_backtrace+0x1c) [0x40003fd9712c]
 1  /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x250) [0x40003fd993d0]
 2  /usr/local/ucx/lib/libucs.so.0(+0x26530) [0x40003fd99530]
 3  /usr/local/ucx/lib/libucs.so.0(+0x268c0) [0x40003fd998c0]
 4  linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x40003bf305b8]
 5  /thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12(MPIR_Err_return_comm+0x78) [0x40003c9adaa8]
 6  /thfs1/home/liujinmei/software/petsc/3.17.2/lib/libpetsc.so.3.17(PetscInitialize+0x19c) [0x40003d511ebc]
 7  ./asfem() [0x40acec]
 8  /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0x40003fa00090]
 9  ./asfem() [0x409e84]
=================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==800040==ERROR: AddressSanitizer: SEGV on unknown address 0x2a47000c3528 (pc 0x40003c9adaa8 bp 0xfffff6bbdf40 sp 0xfffff6bbdf40 T0)
==800040==The signal is caused by a READ memory access.
    #0 0x40003c9adaa8 in MPIR_Err_return_comm (/thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12+0x2e7aa8)
    #1 0x40003d511eb8 in PetscInitialize (/thfs1/home/liujinmei/software/petsc/3.17.2/lib/libpetsc.so.3.17+0x1d2eb8)
    #2 0x40ace8 in main ../src/main.cpp:24
    #3 0x40003fa0008c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x2408c)
    #4 0x409e80  (/thfs1/home/liujinmei/AsFem/bin/asfem+0x409e80)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/thfs1/software/mpich/mpi-x-gcc9.3.0/lib/libmpi.so.12+0x2e7aa8) in MPIR_Err_return_comm
==800040==ABORTING
bbsy789 commented 2 years ago

cpu:arrch64 system:ubuntu compiler:gcc12.2 mpi: mpich/mpi-x-gcc9.3.0 petsc:3.17.2 finded this issue.

bbsy789 commented 2 years ago

The issue's reason may be found, in which the openmpi and petsc must be compiled by the same compiler's version. This issue has been solved. Thanks for your help!