Closed wperkins closed 1 year ago
Do you have a specific input file that shows this error? And this is the dsf.x code in applications/dynamic_simulation_full_y?
I'm not seeing any problems with the 145 bus case on constance using the progress ranks runtime. Are you using the two-sided runtime? Is there an input file for the 200 bus case (the closest I see is input_240bus.xml) or did you create your own input?
I'm not seeing any problems with the 145 bus case on constance using the progress ranks runtime. Are you using the two-sided runtime? Is there an input file for the 200 bus case (the closest I see is input_240bus.xml) or did you create your own input?
Debug or Release? These cases mostly run if GridPACK is built Debug. I get seemingly random memory errors at exit on my Ubuntu system. RHEL may not report such errors. The problem changes when GridPACK is built Release. See this unit test summary. In our previous conversation, you were seeing exactly the same problem I was with the 240-bus and a Release build.
For the 145 bus case, the memory error is when some matrix is not being freed. Here's the back trace.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xffffffff)
* frame #0: 0x00000001a9c3a9e8 libsystem_malloc.dylib`tiny_free_no_lock + 1860
frame #1: 0x00000001a9c3a120 libsystem_malloc.dylib`free_tiny + 496
frame #2: 0x00000001011e1260 libmpi.40.dylib`mca_coll_base_comm_unselect + 10100
frame #3: 0x00000001011472e0 libmpi.40.dylib`ompi_comm_destruct + 36
frame #4: 0x0000000101149208 libmpi.40.dylib`ompi_comm_free + 508
frame #5: 0x0000000101176c64 libmpi.40.dylib`MPI_Comm_free + 168
frame #6: 0x00000001020bc51c libpetsc.3.020.dylib`Petsc_Counter_Attr_Delete_Fn(comm=<unavailable>, keyval=<unavailable>, count_val=0x000060000026ad80, extra_state=<unavailable>) at pinit.c:361:5 [opt]
frame #7: 0x00000001011464a8 libmpi.40.dylib`ompi_attr_delete_impl + 612
frame #8: 0x0000000101146844 libmpi.40.dylib`ompi_attr_delete_all + 232
frame #9: 0x0000000101149048 libmpi.40.dylib`ompi_comm_free + 60
frame #10: 0x0000000101176c64 libmpi.40.dylib`MPI_Comm_free + 168
frame #11: 0x00000001020c5c04 libpetsc.3.020.dylib`PetscCommDestroy(comm=0x0000000104853018) at tagm.c:331:5 [opt]
frame #12: 0x000000010209c718 libpetsc.3.020.dylib`PetscHeaderDestroy_Private(obj=0x0000000104853000, clear_for_reuse=PETSC_FALSE) at inherit.c:158:5 [opt]
frame #13: 0x000000010209c39c libpetsc.3.020.dylib`PetscHeaderDestroy_Function(h=0x000060000026ad28) at inherit.c:93:3 [opt]
frame #14: 0x000000010224d528 libpetsc.3.020.dylib`MatDestroy(A=0x000060000026ad28) at matrix.c:1418:3 [opt]
frame #15: 0x00000001002f3dbc dsf.x`gridpack::math::PetscMatrixWrapper::~PetscMatrixWrapper(this=0x000060000026ad20) at petsc_matrix_wrapper.cpp:122:16
frame #16: 0x00000001002f3e70 dsf.x`gridpack::math::PetscMatrixWrapper::~PetscMatrixWrapper(this=0x000060000026ad20) at petsc_matrix_wrapper.cpp:115:1
frame #17: 0x00000001002e8f4c dsf.x`void boost::checked_delete<gridpack::math::PetscMatrixWrapper>(x=0x000060000026ad20) at checked_delete.hpp:36:5
frame #18: 0x00000001002e8f0c dsf.x`boost::scoped_ptr<gridpack::math::PetscMatrixWrapper>::~scoped_ptr(this=0x00006000018772b0) at scoped_ptr.hpp:88:9
frame #19: 0x00000001002e8edc dsf.x`boost::scoped_ptr<gridpack::math::PetscMatrixWrapper>::~scoped_ptr(this=0x00006000018772b0) at scoped_ptr.hpp:84:5
frame #20: 0x00000001002e8e78 dsf.x`gridpack::math::PETScMatrixImplementation<std::__1::complex<double>, int>::~PETScMatrixImplementation(this=0x0000600001877280) at petsc_matrix_implementation.hpp:123:3
frame #21: 0x00000001002e7048 dsf.x`gridpack::math::PETScMatrixImplementation<std::__1::complex<double>, int>::~PETScMatrixImplementation(this=0x0000600001877280) at petsc_matrix_implementation.hpp:122:3
frame #22: 0x00000001002e7074 dsf.x`gridpack::math::PETScMatrixImplementation<std::__1::complex<double>, int>::~PETScMatrixImplementation(this=0x0000600001877280) at petsc_matrix_implementation.hpp:122:3
frame #23: 0x00000001001648a4 dsf.x`void boost::checked_delete<gridpack::math::MatrixImplementation<std::__1::complex<double>, int>>(x=0x0000600001877280) at checked_delete.hpp:36:5
frame #24: 0x000000010016485c dsf.x`boost::scoped_ptr<gridpack::math::MatrixImplementation<std::__1::complex<double>, int>>::~scoped_ptr(this=0x000060000026ad18) at scoped_ptr.hpp:88:9
frame #25: 0x0000000100163cd0 dsf.x`boost::scoped_ptr<gridpack::math::MatrixImplementation<std::__1::complex<double>, int>>::~scoped_ptr(this=0x000060000026ad18) at scoped_ptr.hpp:84:5
frame #26: 0x0000000100164900 dsf.x`gridpack::math::MatrixT<std::__1::complex<double>, int>::~MatrixT(this=0x000060000026ad00) at matrix.hpp:137:3
frame #27: 0x0000000100163d28 dsf.x`gridpack::math::MatrixT<std::__1::complex<double>, int>::~MatrixT(this=0x000060000026ad00) at matrix.hpp:136:3
frame #28: 0x0000000100163d54 dsf.x`gridpack::math::MatrixT<std::__1::complex<double>, int>::~MatrixT(this=0x000060000026ad00) at matrix.hpp:136:3
frame #29: 0x00000001001650ec dsf.x`void boost::checked_delete<gridpack::math::MatrixT<std::__1::complex<double>, int>>(x=0x000060000026ad00) at checked_delete.hpp:36:5
frame #30: 0x00000001001651ec dsf.x`boost::detail::sp_counted_impl_p<gridpack::math::MatrixT<std::__1::complex<double>, int>>::dispose(this=0x00006000002358c0) at sp_counted_impl.hpp:89:9
frame #31: 0x000000010000acac dsf.x`boost::detail::sp_counted_base::release(this=0x00006000002358c0) at sp_counted_base_gcc_atomic.hpp:120:13
frame #32: 0x000000010000ac58 dsf.x`boost::detail::shared_count::~shared_count(this=0x000000016fdfee58) at shared_count.hpp:432:29
frame #33: 0x000000010000ac08 dsf.x`boost::detail::shared_count::~shared_count(this=0x000000016fdfee58) at shared_count.hpp:431:5
frame #34: 0x0000000100049ba0 dsf.x`boost::shared_ptr<gridpack::math::MatrixT<std::__1::complex<double>, int>>::~shared_ptr(this=0x000000016fdfee50) at shared_ptr.hpp:335:25
frame #35: 0x000000010001e098 dsf.x`boost::shared_ptr<gridpack::math::MatrixT<std::__1::complex<double>, int>>::~shared_ptr(this=0x000000016fdfee50) at shared_ptr.hpp:335:25
frame #36: 0x000000010001f350 dsf.x`gridpack::dynamic_simulation::DSFullApp::~DSFullApp(this=0x000000016fdfe8e0) at dsf_app_module.cpp:101:1
frame #37: 0x0000000100020134 dsf.x`gridpack::dynamic_simulation::DSFullApp::~DSFullApp(this=0x000000016fdfe8e0) at dsf_app_module.cpp:100:1
frame #38: 0x000000010000a050 dsf.x`main(argc=2, argv=0x000000016fdff5c0) at dsf_main.cpp:124:3
frame #39: 0x00000001a9abbf28 dyld`start + 2236
I get the same error for the 240-bus system as well.
Pushed a fix to fix/testing
branch that resolves the memory corruption issue in release mode. The issue was that the method getAngle()
is messed up. First off, it is NOT defined for the new added models such as REGCA1 generator models. It is declared as a virtual method in the base generator class. So, even if REGCA1 does not define getAngle(), the base class method should be picked up. Which does not happen in the release mode. Secondly, it is also not correctly implemented since it only returns the angle of the first generator at a bus. I think the getAngle() method is used at a number of locations so this should be fixed.
To fix the issue, I have turned off the securityCheck
method called in the dynamics_simulation application which calls the getAngle() method. In my opinion, the securityCheck should be OFF by default and only called when the user requests it through set option.
@wperkins : Can you please retest and see if you get the same error.
@wperkins : Can you please retest and see if you get the same error.
I'll check it out. Thanks.
For the 145 bus case, the memory error is when some matrix is not being freed. Here's the back trace.
@abhyshr, on your Mac, did you build Debug or Release and what PETSc version did you use. I'd like to try building on my Mac. Thanks.
Release version. Used PETSc 3.20
Pushed a fix to
fix/testing
branch that resolves the memory corruption issue in release mode. [...]@wperkins : Can you please retest and see if you get the same error.
These changes fixed the 240-bus smoke tests for me. The only test failing for me now is the parallel 145-bus case. I'll try to follow @abhyshr's clue above on my Mac.
Pushed a fix to
fix/testing
branch that resolves the memory corruption issue in release mode. [...] @wperkins : Can you please retest and see if you get the same error.These changes fixed the 240-bus smoke tests for me. The only test failing for me now is the parallel 145-bus case. I'll try to follow @abhyshr's clue above on my Mac.
Currently, on Ubuntu with fix/testing
the 145-bus dynamic simulation smoke test fails for me seemingly at random with both Debug
and Release
builds and with PETSc built complex or real. I can make the 145-bus case pass if I use the MUMPS solver instead of SuperLU_dist. This kind of thing may have been observed by other PETSc users. Using MUMPS as the GridPACK unit test parallel direct solver is supposed to happen automatically if PETSc is built without SuperLU_dist (and with MUMPS, of course).
I think this has been addressed as best we can with #164. I'm sure we will run into it again, though. @jacksavage, I suggest that any CI (#173) use PETSc with MUMPS and without SuperLU_dist.
GridPACK dynamic simulation apps (
dsf.x
,wind.x
, etc.) seem to have a memory corruption problem. This manifests in two ways. First, the simulation completes, but the OS (Ubuntu 20, in my case) reports a memory corruption error as described here. Second, when builtRelease
,dxf.x
, a SEGV is reported at an odd place, as described here.I think fixing is key to #164 and #173.