cms-opendata-analyses / PhysObjectExtractorTool

This repository has working code examples (snippets) on how to access different physics objects in the context of CMSSW software.
6 stars 36 forks source link

myjets process seg faults when running with more events #43

Closed caredg closed 3 years ago

caredg commented 3 years ago

@jmhogan, I made a larger scale test executing the poet_cfg.py over all events (-1). I tested this with isData=True and doPat=False, with the JSON quality activated, and using local (new 2012 directory in container) and remote access to the DB. In both tests, the job fails with the seg violation below [1]. There is a core dump, which I could't learn much from using gdb, unfortunately.

A similar test but with isData=False and doPat=False finishes fine. A similar test but with isData=True and doPat=True goes fine as well. A similar test but with isData=False and doPat=True goes fine too.

If I repeat the first test, the one that fails, but excluding the myjets process, the job runs fine.

Just to confirm there is no issues with exhausting my machine's memory, I tried the first test in one of the lxplus machines with the singularity container and the result is the same. This time however, the core dump is expanded (not sure if it is useful) [2]

[1]

Begin processing the 3468th record. Run 195399, Event 69710267, LumiSection 84 at 10-Jul-2021 00:21:50.960 CEST
%MSG-e FatalSystemSignal:  JetAnalyzer:myjets 10-Jul-2021 00:21:50 CEST  Run: 195399 Event: 69710267
A fatal system signal has occurred: segmentation violation
%MSG

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
NOTE:The first few functions on the stack are artifacts of processing the signal and can be ignored

A fatal system signal has occurred: segmentation violation

[2]

%MSG-e FatalSystemSignal:  JetAnalyzer:myjets 09-Jul-2021 23:11:53 UTC  Run: 195399 Event: 69710267
A fatal system signal has occurred: segmentation violation
%MSG

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
NOTE:The first few functions on the stack are artifacts of processing the signal and can be ignored

Thread 4 (Thread 0x7f9aba4bf700 (LWP 21438)):
#0  0x00007f9ad6194cbd in nanosleep () from /lib64/libc.so.6
#1  0x00007f9ad6194b30 in sleep () from /lib64/libc.so.6
#2  0x00007f9aba5cb304 in GarbageCollectorThread(void*, XrdClientThread*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#3  0x00007f9aba63b75f in XrdSysThread_Xeq () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdUtils.so.1
#4  0x00007f9ad6484aa1 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f9ad61d0c4d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f9ab943f700 (LWP 21439)):
#0  0x00007f9ad61c7403 in poll () from /lib64/libc.so.6
#1  0x00007f9aba5af3a7 in XrdClientSock::RecvRaw(void*, int, int, int*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#2  0x00007f9aba5d676c in XrdClientPhyConnection::ReadRaw(void*, int, int, int*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#3  0x00007f9aba5de28f in XrdClientMessage::ReadRaw(XrdClientPhyConnection*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#4  0x00007f9aba5daed5 in XrdClientPhyConnection::BuildMessage(bool, bool) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#5  0x00007f9aba5dc662 in SocketReaderThread(void*, XrdClientThread*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#6  0x00007f9aba63b75f in XrdSysThread_Xeq () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdUtils.so.1
#7  0x00007f9ad6484aa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f9ad61d0c4d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f9ab83bf700 (LWP 21441)):
#0  0x00007f9ad61c7403 in poll () from /lib64/libc.so.6
#1  0x00007f9aba5af3a7 in XrdClientSock::RecvRaw(void*, int, int, int*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#2  0x00007f9aba5d676c in XrdClientPhyConnection::ReadRaw(void*, int, int, int*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#3  0x00007f9aba5de28f in XrdClientMessage::ReadRaw(XrdClientPhyConnection*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#4  0x00007f9aba5daed5 in XrdClientPhyConnection::BuildMessage(bool, bool) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#5  0x00007f9aba5dc662 in SocketReaderThread(void*, XrdClientThread*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdClient.so.1
#6  0x00007f9aba63b75f in XrdSysThread_Xeq () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libXrdUtils.so.1
#7  0x00007f9ad6484aa1 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f9ad61d0c4d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f9ad4953800 (LWP 21344)):
#0  0x00007f9ad61948dd in waitpid () from /lib64/libc.so.6
#1  0x00007f9ad61264e9 in do_system () from /lib64/libc.so.6
#2  0x00007f9ad6126820 in system () from /lib64/libc.so.6
#3  0x00007f9ad81ee81c in TUnixSystem::StackTrace() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/external/slc6_amd64_gcc472/lib/libCore.so
#4  0x00007f9acce07ea4 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreServices.so
#5  <signal handler called>
#6  0x00007f9aae204e72 in JetAnalyzer::analyze(edm::Event const&, edm::EventSetup const&) () from /afs/cern.ch/work/e/ecarrera/playground/CMSSW_5_3_32/lib/slc6_amd64_gcc472/pluginPhysObjectExtractorToolPhysObjectExtractor.so
#7  0x00007f9ad992e4ce in edm::EDAnalyzer::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::CurrentProcessingContext const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#8  0x00007f9ac8a2b426 in edm::WorkerT<edm::EDAnalyzer>::implDoBegin(edm::EventPrincipal&, edm::EventSetup const&, edm::CurrentProcessingContext const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/pluginCalibCalorimetryCastorPlugins.so
#9  0x00007f9ad995eee5 in bool edm::Worker::doWork<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0>::MyPrincipal&, edm::EventSetup const&, edm::CurrentProcessingContext const*, edm::CPUTimer*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#10 0x00007f9ad9966b0b in void edm::Path::processOneOccurrence<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0>::MyPrincipal&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#11 0x00007f9ad996707d in bool edm::Schedule::runTriggerPaths<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0>::MyPrincipal&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#12 0x00007f9ad99671b2 in void edm::Schedule::processOneOccurrence<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)0>::MyPrincipal&, edm::EventSetup const&, bool) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#13 0x00007f9ad99562c5 in edm::EventProcessor::readAndProcessEvent() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#14 0x00007f9ad9930cca in statemachine::HandleEvent::readAndProcessEvent() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#15 0x00007f9ad9932c46 in statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#16 0x00007f9ad994122c in boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<statemachine::HandleLumis> const&, boost::statechart::state_machine<statemachine::Machine, statemachine::Starting, std::allocator<void>, boost::statechart::null_exception_translator>&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#17 0x00007f9ad99418a1 in boost::statechart::detail::safe_reaction_result boost::statechart::simple_state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::transit_impl<statemachine::HandleEvent, statemachine::Machine, boost::statechart::detail::no_transition_function>(boost::statechart::detail::no_transition_function const&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#18 0x00007f9ad9941a12 in boost::statechart::detail::reaction_result boost::statechart::simple_state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::transition<statemachine::Event, statemachine::HandleEvent, boost::statechart::detail::no_context<statemachine::Event>, &boost::statechart::detail::no_context<statemachine::Event>::no_function>, boost::statechart::transition<statemachine::Lumi, statemachine::AnotherLumi, boost::statechart::detail::no_context<statemachine::Lumi>, &boost::statechart::detail::no_context<statemachine::Lumi>::no_function>, boost::statechart::custom_reaction<statemachine::File>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#19 0x00007f9ad9941a51 in boost::statechart::simple_state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#20 0x00007f9ad996267b in boost::statechart::state_machine<statemachine::Machine, statemachine::Starting, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#21 0x00007f9ad9956726 in edm::EventProcessor::runCommon(bool, int) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#22 0x00007f9ad995778b in edm::EventProcessor::runToCompletion(bool) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc472/cms/cmssw/CMSSW_5_3_32/lib/slc6_amd64_gcc472/libFWCoreFramework.so
#23 0x000000000040a477 in main ()

A fatal system signal has occurred: segmentation violation
jmhogan commented 3 years ago

@caredg See PR #45 for a fix. Something about the continue statement (rather than an if) was messing up the indexing of btags, and this particular event seems to have gotten more btag values in that vector than there were jets....I was able to run the whole file with this fix.

caredg commented 3 years ago

fixed with 65bf99b