DUNE-DAQ / appfwk

DUNE DAQ Application Framework Repository (implementations that use interfaces in app-framework-base)
5 stars 5 forks source link

Segfault seen in producer_consumer_dynamic_test at "start" with latest code #59

Closed bieryAtFnal closed 4 years ago

bieryAtFnal commented 4 years ago

I'm not sure if I'm doing something wrong or not, but here is what I see. I'm 99% sure that I have Marco's seg fault fix from earlier today.

[biery@lxplus703 appfwk]$ git status

On branch develop

nothing to commit, working directory clean [biery@lxplus703 appfwk]$ git log commit 4d98f0d5c51616997cdc134dd443ff3934359652 Merge: 32d8a99 18d115d Author: John Freeman jcfree@mu2edaq13.fnal.gov Date: Thu Jun 18 12:33:23 2020 -0500

JCF: Merge remote-tracking branch 'origin/jcfreeman2/issue34_improved_single_cmakelists' into develop

Conflicts:
    CMakeLists.txt

[biery@lxplus703 basicTest3]$ gdb build/appfwk/apps/daq_application GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/build/appfwk/apps/daq_application...done. (gdb) run -c QueryResponseCommandFacility -j appfwk/test/producer_consumer_dynamic_test.json Starting program: /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/build/appfwk/apps/daq_application -c QueryResponseCommandFacility -j appfwk/test/producer_consumer_dynamic_test.json [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: File "/cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py:/usr/lib/golang/src/runtime/runtime-gdb.py". To enable execution of this file add add-auto-load-safe-path /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/lib64/libstdc++.so.6.0.25-gdb.py line to your configuration file "/afs/cern.ch/user/b/biery/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/afs/cern.ch/user/b/biery/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" Enter a command start [New Thread 0x7ffff1a35700 (LWP 3335)] [New Thread 0x7ffff1234700 (LWP 3336)]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff1a35700 (LWP 3335)] 0x00007ffff2259994 in std::__shared_ptr<dunedaq::appfwk::Queue<std::vector<int, std::allocator > >, (__gnu_cxx::_Lock_policy)2>::get (this=0x0) at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/shared_ptr_base.h:1286 1286 { return _M_ptr; } Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.4-37.el7_8.3.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) (gdb) (gdb) (gdb) where

0 0x00007ffff2259994 in std::__shared_ptr<dunedaq::appfwk::Queue<std::vector<int, std::allocator > >, (__gnu_cxx::_Lock_policy)2>::get (this=0x0)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/shared_ptr_base.h:1286

1 0x00007ffff225c46e in std::shared_ptr_access<dunedaq::appfwk::Queue<std::vector<int, std::allocator > >, (gnu_cxx::_Lock_policy)2, false, false>::_M_get (this=0x0)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/shared_ptr_base.h:997

2 0x00007ffff2259eee in std::shared_ptr_access<dunedaq::appfwk::Queue<std::vector<int, std::allocator > >, (gnu_cxx::_Lock_policy)2, false, false>::operator-> (this=0x0)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/shared_ptr_base.h:991

3 0x00007ffff2256f6a in dunedaq::appfwk::DAQSource<std::vector<int, std::allocator > >::can_pop (this=0x0)

at /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/appfwk/include/appfwk/DAQSource.hpp:70

4 0x00007ffff225019d in dunedaq::appfwk::FakeDataConsumerDAQModule::do_work (this=0x6ce390)

at /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/appfwk/test/FakeDataConsumerDAQModule.cpp:99

5 0x00007ffff2262486 in std::__invoke_impl<void, void (dunedaq::appfwk::FakeDataConsumerDAQModule::&)(), dunedaq::appfwk::FakeDataConsumerDAQModule&> (

__f=@0x6d1d30: (void (dunedaq::appfwk::FakeDataConsumerDAQModule::*)(dunedaq::appfwk::FakeDataConsumerDAQModule * const)) 0x7ffff225012a <dunedaq::appfwk::FakeDataConsumerDAQModule::do_work()>, __t=@0x6d1d40: 0x6ce390)
at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/invoke.h:73

6 0x00007ffff2260747 in std::__invoke<void (dunedaq::appfwk::FakeDataConsumerDAQModule::&)(), dunedaq::appfwk::FakeDataConsumerDAQModule&> (

__fn=@0x6d1d30: (void (dunedaq::appfwk::FakeDataConsumerDAQModule::*)(dunedaq::appfwk::FakeDataConsumerDAQModule * const)) 0x7ffff225012a <dunedaq::appfwk::FakeDataConsumerDAQModule::do_work()>, __args#0=@0x6d1d40: 0x6ce390)
at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/invoke.h:95

7 0x00007ffff225deea in std::_Bind<void (dunedaq::appfwk::FakeDataConsumerDAQModule::(dunedaq::appfwk::FakeDataConsumerDAQModule))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (this=0x6d1d30,

__args=<unknown type in /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/build/appfwk/test/libappfwk_FakeDataConsumerDAQModule_duneDAQModule.so, CU 0x0, DIE 0x5bfce>)
at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/functional:400

8 0x00007ffff225b260 in std::_Bind<void (dunedaq::appfwk::FakeDataConsumerDAQModule::*(dunedaq::appfwk::FakeDataConsum---Type to continue, or q to quit---

erDAQModule*))()>::operator()<, void>() (this=0x6d1d30) at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/functional:484

9 0x00007ffff2258d12 in std::_Function_handler<void (), std::_Bind<void (dunedaq::appfwk::FakeDataConsumerDAQModule::(dunedaq::appfwk::FakeDataConsumerDAQModule))()> >::_M_invoke(std::_Any_data const&) (__functor=...)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/std_function.h:297

10 0x00007ffff70c7388 in std::function<void ()>::operator()() const (this=0x6ce3d8)

at /scratch/workspace/critic-all/BUILDTYPE/debug/QUAL/e19/label1/swarm/label2/SLF7/build/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/std_function.h:687

11 0x00007ffff225388d in dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}::operator()() const (

__closure=0x6d82e8)
at /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/appfwk/include/appfwk/ThreadHelper.hpp:68

12 0x00007ffff2258891 in std::invoke_impl<void, dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}>(std::invoke_other, dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}&&) (

__f=<unknown type in /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/build/appfwk/test/libappfwk_FakeDataConsumerDAQModule_duneDAQModule.so, CU 0x0, DIE 0x61f46>)
at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/invoke.h:60

13 0x00007ffff2255e8d in std::__invoke<dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}>(std::__invoke_result&&, (dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}&&)...) (

__fn=<unknown type in /afs/cern.ch/work/b/biery/public/dunedaq/basicTest3/build/appfwk/test/libappfwk_FakeDataConsumerDAQModule_duneDAQModule.so, CU 0x0, DIE 0x64513>)
at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/bits/invoke.h:95

14 0x00007ffff2267ab2 in std::thread::_Invoker<std::tuple<dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x6d82e8)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/thread:234

15 0x00007ffff2267276 in std::thread::_Invoker<std::tuple<dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}> >::operator()() (this=0x6d82e8)

at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/thread:243

16 0x00007ffff2266bf6 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<dunedaq::appfwk::ThreadHelper::start_workingthread()::{lambda()#1}> > >::_M_run() (this=0x6d82e0)

---Type to continue, or q to quit--- at /cvmfs/dune.opensciencegrid.org/dunedaq/DUNE/products/gcc/v8_2_0/Linux64bit+3.10-2.17/include/c++/8.2.0/thread:186

17 0x00007ffff697002f in execute_native_thread_routine () at ../../../.././libstdc++-v3/src/c++11/thread.cc:80

18 0x00007ffff7661ea5 in start_thread () from /lib64/libpthread.so.0

19 0x00007ffff60cb8dd in clone () from /lib64/libc.so.6

(gdb) (gdb) quit A debugging session is active.

Inferior 1 [process 3309] will be killed.

Quit anyway? (y or n) y

jcfreeman2 commented 4 years ago

I think the problem is that you issued a "start" without first issuing a "configure"

jcfreeman2 commented 4 years ago

Or rather, that's one of the problems. Another problem is that the program's response to invalid input is to crash out rather than to politely inform the user.

bieryAtFnal commented 4 years ago

Ah, yes, I should have remembered to request the config transition first. Thanks. Your point about better handling of invalid transitions is good. Maybe, at a minimum, we recommend that DAQModule developers take lots of care in checking that queues, etc. are valid before trying to use them.

brettviren commented 4 years ago

My understanding is that the "queue registry" is responsible for instantiating queues based on configuration. It is thus the "gatekeeper" of queues and so it should assume responsibility to assure only valid queues escape its keep to any client.

So, if some client (ie, a module) requests an invalid queue from the registry then that registry method should throw an (ERS) exception.

bieryAtFnal commented 4 years ago

Brett, it looks to me like the QueueRegistry is currently throwing exceptions when it can't successfully create a requested queue, as you suggest. And, with the changes associated with Issue #48 and the subsequent changes that I see in the examples, the queues are constructed and looked up in the modules independent of the Configure step, so there isn't the chance for the sort of bad consequences that I mentioned when originally filing this issue.