Argonne-National-Laboratory / PIPS

Parallel solvers for optimization problems
Other
73 stars 21 forks source link

PIPS-NLP example parmodel crash with branch fix/memleaks #63

Closed abhyshr closed 4 years ago

abhyshr commented 4 years ago
PIPS:fix/memleaks$ cd build/PIPS-NLP/Test
Test:fix/memleaks$ ./parmodel

  --------------------------------------------------------------------
  NLP Solver 
  Argonne National Laboratory 
  Lawrence Livermore National Laboratory
  2010-2018
  -----------------------------------------------

  Linear system solver ------    Ma57.
  Schur complement treatment (1)
[PIPS] - 2 scenarios on 1 MPI ranks
solving ...
1st stage 2 variables, 1 equality constraints, 0 inequality constraints.
terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast
[constance01:02020] *** Process received signal ***
[constance01:02020] Signal: Aborted (6)
[constance01:02020] Signal code:  (-6)
[constance01:02020] [ 0] /lib64/libc.so.6[0x38a4c326a0]
[constance01:02020] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x38a4c32625]
[constance01:02020] [ 2] /lib64/libc.so.6(abort+0x175)[0x38a4c33e05]
[constance01:02020] [ 3] /share/apps/gcc/4.9.2/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7f67ef726d9d]
[constance01:02020] [ 4] /share/apps/gcc/4.9.2/lib64/libstdc++.so.6(+0x5de26)[0x7f67ef724e26]
[constance01:02020] [ 5] /share/apps/gcc/4.9.2/lib64/libstdc++.so.6(+0x5de71)[0x7f67ef724e71]
[constance01:02020] [ 6] /share/apps/gcc/4.9.2/lib64/libstdc++.so.6(+0x5e088)[0x7f67ef725088]
[constance01:02020] [ 7] /share/apps/gcc/4.9.2/lib64/libstdc++.so.6(+0x5ce12)[0x7f67ef723e12]
[constance01:02020] [ 8] ./parmodel[0x454b69]
[constance01:02020] [ 9] /people/abhy245/software/PIPS/build_pips/PIPS-NLP/libparpipsnlp.so(_ZN19NlpPIPSIpmInterfaceI11sFactoryAug20FilterIPMStochSolver15StructJuMPsInfoE2goEi+0x1bd)[0x7f67f1bb6983]
[constance01:02020] [10] /people/abhy245/software/PIPS/build_pips/PIPS-NLP/libparpipsnlp.so(PipsNlpSolveStruct+0x2b1)[0x7f67f1bb20c7]
[constance01:02020] [11] ./parmodel(main+0x92)[0x4343a2]
[constance01:02020] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x38a4c1ed5d]
[constance01:02020] [13] ./parmodel[0x4351f9]
[constance01:02020] *** End of error message ***
Aborted

Here's the valgrind log



  Linear system solver ------    Ma57.
  Schur complement treatment (1)
[PIPS] - 2 scenarios on 1 MPI ranks
==7142== Invalid write of size 8
==7142==    at 0x477DA4: sInfo::sInfo(sData*) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4BB255: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4C97350: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==  Address 0xc70b6e0 is 0 bytes after a block of size 272 alloc'd
==7142==    at 0x4A07078: operator new(unsigned long) (vg_replace_malloc.c:333)
==7142==    by 0x4C97336: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142== 
==7142== Invalid write of size 8
==7142==    at 0x477DAF: sInfo::sInfo(sData*) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4BB255: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4C97350: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==  Address 0xc70b6e8 is 8 bytes after a block of size 272 alloc'd
==7142==    at 0x4A07078: operator new(unsigned long) (vg_replace_malloc.c:333)
==7142==    by 0x4C97336: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142== 
==7142== Invalid write of size 8
==7142==    at 0x477DBA: sInfo::sInfo(sData*) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4BB255: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4C97350: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==  Address 0xc70b6f8 is 24 bytes after a block of size 272 in arena "client"
==7142== 
==7142== Invalid write of size 8
==7142==    at 0x4BB26D: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4C97350: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==  Address 0xc70b6f0 is 16 bytes after a block of size 272 alloc'd
==7142==    at 0x4A07078: operator new(unsigned long) (vg_replace_malloc.c:333)
==7142==    by 0x4C97336: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142== 

valgrind: m_mallocfree.c:303 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 336, hi = 118891904.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.

host stacktrace:
==7142==    at 0x38083F48: show_sched_status_wrk (m_libcassert.c:343)
==7142==    by 0x38084064: report_and_quit (m_libcassert.c:415)
==7142==    by 0x380841F1: vgPlain_assert_fail (m_libcassert.c:481)
==7142==    by 0x38091A9C: get_bszB_as_is (m_mallocfree.c:301)
==7142==    by 0x38091A9C: get_bszB (m_mallocfree.c:311)
==7142==    by 0x38091A9C: get_pszB (m_mallocfree.c:385)
==7142==    by 0x38091A9C: vgPlain_describe_arena_addr (m_mallocfree.c:1527)
==7142==    by 0x3807D603: vgPlain_describe_addr (m_addrinfo.c:186)
==7142==    by 0x3807BE93: vgMemCheck_update_Error_extra (mc_errors.c:1141)
==7142==    by 0x3808006A: vgPlain_maybe_record_error (m_errormgr.c:813)
==7142==    by 0x3807B42A: vgMemCheck_record_address_error (mc_errors.c:760)
==7142==    by 0x38059391: mc_LOADVn_slow (mc_main.c:1456)
==7142==    by 0x809D72CBB: ???
==7142==    by 0x809089F2F: ???
==7142==    by 0x80200854F: ???
==7142==    by 0x4BB286: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 7142)
==7142==    at 0x4BB28C: StructJuMPsInfo::StructJuMPsInfo(sData*, stochasticInput&) (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)
==7142==    by 0x4C97350: NlpPIPSIpmInterface<sFactoryAug, FilterIPMStochSolver, StructJuMPsInfo>::NlpPIPSIpmInterface(stochasticInput&, ompi_communicator_t*) (NlpPIPSIpmInterface.h:99)
==7142==    by 0x4C93096: PipsNlpSolveStruct (parallelPipsNlp_C_Callback.cpp:162)
==7142==    by 0x4343A1: main (in /qfs/people/abhy245/PIPS/build/PIPS-NLP/Test/parmodel)

Thread 2: status = VgTs_WaitSys (lwpid 7226)
==7142==    at 0x38A4CDF113: poll (in /lib64/libc-2.12.so)
==7142==    by 0x7982E67: poll_dispatch (in /qfs/projects/ops/rh6/openmpi/1.8.3/gcc/4.9.2/lib/libopen-pal.so.6.2.1)
==7142==    by 0x797A486: opal_libevent2021_event_base_loop (in /qfs/projects/ops/rh6/openmpi/1.8.3/gcc/4.9.2/lib/libopen-pal.so.6.2.1)
==7142==    by 0x76CEC2D: orte_progress_thread_engine (in /qfs/projects/ops/rh6/openmpi/1.8.3/gcc/4.9.2/lib/libopen-rte.so.7.0.5)
==7142==    by 0x38A5407AA0: start_thread (in /lib64/libpthread-2.12.so)
==7142==    by 0x38A4CE893C: clone (in /lib64/libc-2.12.so)
``
cnpetra commented 4 years ago

I am unable to reproduce this: valgrind reports no errors and the executable works.

@michel2323 can you please take a look? I am swamped with other PIPS issues that Shri has.

abhyshr commented 4 years ago

I did a fresh clone and tried it again. I cannot reproduce the error anymore. Seems like it may be due to something stale in my previous build. I am going to close this issue.

Thanks, Shri

From: Cosmin G Petra notifications@github.com Reply-To: Argonne-National-Laboratory/PIPS reply@reply.github.com Date: Thursday, January 9, 2020 at 3:56 PM To: Argonne-National-Laboratory/PIPS PIPS@noreply.github.com Cc: "Abhyankar, Shrirang G" shrirang.abhyankar@pnnl.gov, Author author@noreply.github.com Subject: Re: [Argonne-National-Laboratory/PIPS] PIPS-NLP example parmodel crash with branch fix/memleaks (#63)

I am unable to reproduce this: valgrind reports no errors and the executable works.

@michel2323https://protect2.fireeye.com/v1/url?k=4552cb1b-19e7f5d4-4552e10e-0cc47adc5e60-900f8686ce7888a3&q=1&e=75ced272-4c64-4760-aa51-8b8589f6b11a&u=https%3A%2F%2Fgithub.com%2Fmichel2323 can you please take a look? I am swamped with other PIPS issues that Shri has.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=df84a480-83319a4f-df848e95-0cc47adc5e60-4e9df8b35cd7bd03&q=1&e=75ced272-4c64-4760-aa51-8b8589f6b11a&u=https%3A%2F%2Fgithub.com%2FArgonne-National-Laboratory%2FPIPS%2Fissues%2F63%3Femail_source%3Dnotifications%26email_token%3DAAI64MEY7B7CJEVVZVLN4H3Q46MQDA5CNFSM4KE6WHWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIR5SHQ%23issuecomment-572774686, or unsubscribehttps://protect2.fireeye.com/v1/url?k=d14d615d-8df85f92-d14d4b48-0cc47adc5e60-155679a0b52a1f08&q=1&e=75ced272-4c64-4760-aa51-8b8589f6b11a&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAI64MADY4EHMNUXVEBQMPLQ46MQDANCNFSM4KE6WHWA.

abhyshr commented 4 years ago

Building again with a fresh clone does not cause this issue. Closing it.

cnpetra commented 4 years ago

phew! looked like a super nasty bug

michel2323 commented 4 years ago

Let me know guys if there's anything I can help.