cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

25202.0, step2_DIGI_L1_DIGI2RAW_HLT_PU.py, large stack allocation (or worse)? #18070

Closed davidlt closed 4 years ago

davidlt commented 7 years ago

I looked at valgrind report for 25202.0 step2_DIGI_L1_DIGI2RAW_HLT_PU.py using CMSSW_9_1_X_2017-03-22-2300 + slc6_amd64_gcc630

I have noticed the following warnings:

==4213== Warning: client switching stacks?  SP change: 0xffeff4308 --> 0xffed9d8d0
==4213== Warning: client switching stacks?  SP change: 0xffed9d8d0 --> 0xffeff4308
==4213== Warning: client switching stacks?  SP change: 0xffeff4308 --> 0xffed9d8d0

Once valgrind noticed a large change in stack pointer it will not produce correct results:

==4213== Warning: client switching stacks?  SP change: 0xffeff4308 --> 0xffed9d8d0
==4213==          to suppress, use: --max-stackframe=2452024 or greater
==4213== Invalid write of size 8
==4213==    at 0x2D7253D8: L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) (L1TMuonEndCapTrackProducer.cc:61)
==4213==    by 0x4BBF8C8: edm::EDProducer::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B8C421: edm::WorkerT<edm::EDProducer>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B40024: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) [clone .isra.82] (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B4B16E: decltype ({parm#1}()) edm::convertException::wrap<void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B4B2BB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B4BD22: void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1} const&) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B4BE80: edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::passDownChain<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}>(unsigned int, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1} const&)::{lambda()#1}>::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x5F26952: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==4213==    by 0x4C04AC9: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B4E7CE: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==    by 0x4B501E6: statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==4213==  Address 0xffed9d8f8 is on thread 1's stack
==4213==  in frame #0, created by L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) (L1TMuonEndCapTrackProducer.cc:61)

Looks like there could be a frame that with ~2.4MiB of data allocation on stack.

The default stack size on my particular system is 10MiB (some of which probably goes for TLS).

These warnings (and errors following it) could be caused by coroutine libraries / lightweight friends / , which we don't have in CMSSW AFAIK.

cmsbuild commented 7 years ago

A new Issue was created by @davidlt .

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlange6 commented 7 years ago

maybe related

https://github.com/cms-sw/cmssw/issues/16850#issuecomment-266019072

On Mar 24, 2017, at 1:40 PM, davidlt notifications@github.com wrote:

I looked at valgrind report for 25202.0 step2_DIGI_L1_DIGI2RAW_HLT_PU.py using CMSSW_9_1_X_2017-03-22-2300 + slc6_amd64_gcc630

I have noticed the following warnings:

==4213== Warning: client switching stacks? SP change: 0xffeff4308 --> 0xffed9d8d0 ==4213== Warning: client switching stacks? SP change: 0xffed9d8d0 --> 0xffeff4308 ==4213== Warning: client switching stacks? SP change: 0xffeff4308 --> 0xffed9d8d0

Once valgrind noticed a large change in stack pointer it will not produce correct results:

==4213== Warning: client switching stacks? SP change: 0xffeff4308 --> 0xffed9d8d0 ==4213== to suppress, use: --max-stackframe=2452024 or greater ==4213== Invalid write of size 8 ==4213== at 0x2D7253D8: L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) (L1TMuonEndCapTrackProducer.cc:61) ==4213== by 0x4BBF8C8: edm::EDProducer::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry, edm::ModuleCallingContext const) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B8C421: edm::WorkerT::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B40024: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const) [clone .isra.82] (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B4B16E: decltype ({parm#1}()) edm::convertException::wrap<void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::exception_ptr::exception_ptr const, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const)::{lambda()#1}>(void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::exception_ptr::exception_ptr const, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B4B2BB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B4BD22: void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1} const&) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B4BE80: edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::passDownChain<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}>(unsigned int, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1} const&)::{lambda()#1}>::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x5F26952: tbb::internal::custom_scheduler::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501) ==4213== by 0x4C04AC9: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== by 0x4B4E7CE: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64gcc630/libFWCoreFramework.so) ==4213== by 0x4B501E6: statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) (in /cvmfs/cms-ib.cern.ch/nweek-02464/slc6_amd64_gcc630/cms/cmssw/CMSSW_9_1_X_2017-03-22-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so) ==4213== Address 0xffed9d8f8 is on thread 1's stack ==4213== in frame #0, created by L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) (L1TMuonEndCapTrackProducer.cc:61)

Looks like there could be a frame that with ~2.4MiB of data allocation on stack.

The default stack size on my particular system is 10MiB (some of which probably goes for TLS).

These warnings (and errors following it) could be caused by coroutine libraries / lightweight friends / , which we don't have in CMSSW AFAIK.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 commented 7 years ago

assign l1

On Mar 24, 2017, at 1:40 PM, cmsbuild notifications@github.com wrote:

A new Issue was created by @davidlt .

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here cms-sw/cmssw#13029

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cmsbuild commented 7 years ago

New categories assigned: l1

@mulhearn,@rekovic you have been requested to review this Pull request/Issue and eventually sign? Thanks

davidlt commented 7 years ago

Yeah, looks like #16850 talks about the same issue. Valgrind has a threshold for stack growth. If something allocated more than that, then algorithm will not work correctly. Then one needs manually to set higher threshold.

I guess, we have one function which allocates MiBs on the stack and that's beyond default valgrind threshold.

smuzaffar commented 4 years ago

closing in favor of @16850