SCOREC / fep

Finite Element Programming course materials
6 stars 4 forks source link

MPI Error When Run File With GDB #41

Closed BryanMcKeever closed 1 year ago

BryanMcKeever commented 1 year ago

I'm looking to debug my a4_element_stiffness.cpp file. It runs fine (no errors) outside of GDB, I'm just looking to investigate some variable values. Thus, I enter these exact commands:

Assertion failed: (FE->GetOrder() == fec->GetOrder()) is false: --> internal error: 1 != 0 ... in function: virtual const mfem::FiniteElement* mfem::FiniteElementSpace::GetFE(int) const ... in file: fem/fespace.cpp:2801


MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

[Thread 0x7fffeff66700 (LWP 94151) exited] [Thread 0x7ffff0767700 (LWP 94150) exited] [Inferior 1 (process 94142) exited with code 01] Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 touch-catch-0.0.4-1.el7.x86_64

cwsmith commented 1 year ago

@BryanMcKeever The assertion in fem/fespace.cpp:2801 is disabled in the release build of mfem (the module loaded with source erp_env_setup.sh):

https://github.com/mfem/mfem/blob/888b8eca6dfe98e4838c95b4a4fbddd0a51141cb/fem/fespace.cpp#L2798-L2807

I'll have to look into this more. We should be able to run successfully with either mfem builds.

cwsmith commented 1 year ago

@BryanMcKeever I just pushed a fix. Thanks for finding and reporting this.

BryanMcKeever commented 1 year ago

@cwsmith I am still having the same error. This did not solve my problem.

cwsmith commented 1 year ago

Hmmmm. Hi @BryanMcKeever. You copied this change https://github.com/SCOREC/fep/commit/1d1487dc84d895cc1816f6a958da664bdc750983 to your local copy of a4/LagrangeElements.hpp, ran 'make clean', 'make', and still see the error when running a4_element_stiffness (with or without GDB it will show up)?

BryanMcKeever commented 1 year ago

I did that, and a new error shows up now, but again only under GDB (runs fine, no error when run normally). It is:

Reading symbols from /gpfs/u/home/FEP6/FEP6mckv/a4/a4_element_stiffness...done.
(gdb) r
Starting program: /gpfs/u/home/FEP6/FEP6mckv/a4/./a4_element_stiffness --mesh ./data/1x1_square_quad.mesh --order 1
warning: File "/gpfs/u/software/erp-rhel7/gcc/9.1.0/1/lib64/libstdc++.so.6.0.26-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
        add-auto-load-safe-path /gpfs/u/software/erp-rhel7/gcc/9.1.0/1/lib64/libstdc++.so.6.0.26-gdb.py
line to your configuration file "/gpfs/u/home/FEP6/FEP6mckv/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/gpfs/u/home/FEP6/FEP6mckv/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libgpfs.so
[Detaching after fork from child process 92496]
[New Thread 0x7ffff0767700 (LWP 92500)]
[New Thread 0x7fffeff66700 (LWP 92501)]
Options used:
   --mesh ./data/1x1_square_quad.mesh
   --order 1

Assertion failed: (data && i >= 0 && i < height && j >= 0 && j < width) is false:
 -->
 ... in function: double& mfem::DenseMatrix::operator()(int, int)
 ... in file: /gpfs/u/software/erp-spack-install/v0190_0/linux-centos7-zen/gcc-9.1.0/mfem-4.5.0-dyzns2o6igwievfdk6iunmacmmlkwh5m/include/mfem/linalg/densemat.hpp:1142

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[Thread 0x7fffeff66700 (LWP 92501) exited]
[Thread 0x7ffff0767700 (LWP 92500) exited]
[Inferior 1 (process 92491) exited with code 01]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 touch-catch-0.0.4-1.el7.x86_64
cwsmith commented 1 year ago

I'm guessing this error is unrelated.

If you revert back to the version of LagrangeElements.hpp that is in the repo and build (with the mfem/4.5.0-pumi-debug module loaded) and run a4_interpolation.cpp or a4_projection.cpp do you see any errors?

cwsmith commented 1 year ago

Summary of offline discussion: