cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

Crash in L1FPGATrackProducer when using glibc::malloc #30675

Closed tommasoboccali closed 4 years ago

tommasoboccali commented 4 years ago

Dear all,

we are experiencing a crash when running L1FPGATrackProducer, in a very uncommon setup (spotted since AARCH64 and PPC IBs do not use JEmalloc).

In a nutshell, when using CMSSW_11_1_0_patch2 + merge 30657 30660 30666 (this will be soon patch3) and a production like PSET (the one we are testing for HLT rereco) /afs/cern.ch/user/t/tboccali/public/pset_crash.py

and running with glibc::malloc:

cmsRunGlibC pset_crash.py

you get (almost every time) a crash, like

3 0x00007fbae6a904c9 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so

4

5 0x00007fbaea0091f7 in raise () from /lib64/libc.so.6

6 0x00007fbaea00a8e8 in abort () from /lib64/libc.so.6

7 0x00007fbaea048f47 in __libc_message () from /lib64/libc.so.6

8 0x00007fbaea051d7d in _int_malloc () from /lib64/libc.so.6

9 0x00007fbaea05410c in malloc () from /lib64/libc.so.6

10 0x00007fbaea977218 in operator new (sz=sz@entry=74208) at ../../../../libstdc++-v3/libsupc++/new_op.cc:50

11 0x00007fbaa2045e6f in std::make_unique<trklet::IMATH_TrackletCalculator, trklet::Settings const&, trklet::imathGlobals*&, int, int> () at /cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/gcc/8.2.0-bcolb

f/include/c++/8.4.0/bits/unique_ptr.h:834

12 trklet::Globals::Globals(trklet::Settings const&) () at /scratch/tom/CMSSW_11_1_0_patch2/src/L1Trigger/TrackFindingTracklet/src/Globals.cc:22

13 0x00007fbaa21101c7 in std::make_unique<trklet::Globals, trklet::Settings const&> () at /cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/gcc/8.2.0-bcolbf/include/c++/8.4.0/bits/unique_ptr.h:834

14 trklet::TrackletEventProcessor::init(trklet::Settings const&) () at /scratch/tom/CMSSW_11_1_0_patch2/src/L1Trigger/TrackFindingTracklet/src/TrackletEventProcessor.cc:28

15 0x00007fbaa218e2c3 in L1FPGATrackProducer::beginRun(edm::Run const&, edm::EventSetup const&) () at /scratch/tom/CMSSW_11_1_0_patch2/src/L1Trigger/TrackFindingTracklet/plugins/L1FPGATrackProducer.cc:3

10

which originate from src/L1Trigger/TrackFindingTracklet/src/Globals.cc:22 (but we have seen also :21, which indeed is an identical line...).

I and @davidlange6 have been looking, but there is nothing striking.

To be clear again: with standard cmsRun (JEmalloc) it does not happen, but this leaves us quite uneasy about the results' solidity + it is polluting all the non intel builds (*)

Can you help us here?

A list of the persons who are immediately to be interested are @fwyzard @skinnari @tomalin @rekovic @Dr15Jones @makortel @silviodonato

thanks a lot

*: for example

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc7_ppc64le_gcc820/CMSSW_11_2_X_2020-07-13-2300/pyRelValMatrixLogs/run/20434.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D41_GenSimHLBeamSpotFull14INPUT+DigiFullTrigger_2026D41+RecoFullGlobal_2026D41+HARVESTFullGlobal_2026D41/step2_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D41_GenSimHLBeamSpotFull14INPUT+DigiFullTrigger_2026D41+RecoFullGlobal_2026D41+HARVESTFullGlobal_2026D41.log

cmsbuild commented 4 years ago

A new Issue was created by @tommasoboccali Tommaso Boccali.

@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

silviodonato commented 4 years ago

assign core

silviodonato commented 4 years ago

assign l1

cmsbuild commented 4 years ago

New categories assigned: core,l1

@Dr15Jones,@smuzaffar,@benkrikler,@rekovic,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

silviodonato commented 4 years ago

During the build of CMSSW_11_1_0_patch3 I got errors in cc8:

14-Jul-2020 12:05:03 CEST  Initiating request to open file file:step1.root
14-Jul-2020 12:05:04 CEST  Successfully opened file file:step1.root
create data set info Default
create data set info Default
create data set info Default
create data set info Default
free(): invalid pointer

A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

Tue Jul 14 12:05:32 CEST 2020
Thread 2 (Thread 0x2b9353548700 (LWP 11176)):
#0  0x00002b933c770752 in waitpid () from /lib64/libpthread.so.0
#1  0x00002b933fc4ba37 in edm::service::cmssw_stacktrace_fork() () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/pluginFWCoreServicesPlugins.so
#2  0x00002b933fc4c4fa in edm::service::InitRootHandlers::stacktraceHelperThread() () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/pluginFWCoreServicesPlugins.so
#3  0x00002b933c2faacf in execute_native_thread_routine () at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00002b933c7662de in start_thread () from /lib64/libpthread.so.0
#5  0x00002b933ca7be83 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b933d999980 (LWP 11137)):
#0  0x00002b933ca70f21 in poll () from /lib64/libc.so.6
#1  0x00002b933fc4be9f in full_read.constprop () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/pluginFWCoreServicesPlugins.so
#2  0x00002b933fc4c5dc in edm::service::InitRootHandlers::stacktraceFromThread() () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/pluginFWCoreServicesPlugins.so
#3  0x00002b933fc4d4b9 in sig_dostack_then_abort () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b933c9b770f in raise () from /lib64/libc.so.6
#6  0x00002b933c9a1b25 in abort () from /lib64/libc.so.6
#7  0x00002b933c9fa897 in __libc_message () from /lib64/libc.so.6
#8  0x00002b933ca00fdc in malloc_printerr () from /lib64/libc.so.6
#9  0x00002b933ca028dc in _int_free () from /lib64/libc.so.6
#10 0x00002b935f812af8 in void std::vector<trklet::MemoryBase*, std::allocator<trklet::MemoryBase*> >::_M_realloc_insert<trklet::MemoryBase*>(__gnu_cxx::__normal_iterator<trklet::MemoryBase**, std::vector<trklet::MemoryBase*, std::allocator<trklet::MemoryBase*> > >, trklet::MemoryBase*&&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_1_0_patch3/lib/cc8_amd64_gcc8/libL1TriggerTrackFindingTracklet.so
#11 0x00002b935f812b7f in trklet::MemoryBase*& std::vector<trklet::MemoryBase*, std::allocator<trklet::MemoryBase*> >::emplace_back<trklet::MemoryBase*>(trklet::MemoryBase*&&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_1_0_patch3/lib/cc8_amd64_gcc8/libL1TriggerTrackFindingTracklet.so
#12 0x00002b935f80eabf in trklet::Sector::addMem(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_1_0_patch3/lib/cc8_amd64_gcc8/libL1TriggerTrackFindingTracklet.so
#13 0x00002b935f85bbb1 in trklet::TrackletEventProcessor::init(trklet::Settings const&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_1_0_patch3/lib/cc8_amd64_gcc8/libL1TriggerTrackFindingTracklet.so
#14 0x00002b935f72f2c3 in L1FPGATrackProducer::beginRun(edm::Run const&, edm::EventSetup const&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw-patch/CMSSW_11_1_0_patch3/lib/cc8_amd64_gcc8/pluginTrackFindingTrackletPlugins.so
#15 0x00002b933a6efa75 in edm::one::EDProducerBase::doBeginRun(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#16 0x00002b933a6d0dc0 in edm::WorkerT<edm::one::EDProducerBase>::implDoBegin(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#17 0x00002b933a5f25e5 in decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#18 0x00002b933a5f2809 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#19 0x00002b933a5f2aa3 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#20 0x00002b933a5f2f45 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#21 0x00002b933a5f2ff1 in edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#22 0x00002b933bb7d24d in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x2b933de33200, context_guard=..., t=t@entry=0x2b933dc97c40, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:388
#23 0x00002b933bb7d545 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x2b933de33200, parent=..., child=<optimized out>) at ../../include/tbb/task.h:992
#24 0x00002b933a5d61bb in edm::EventProcessor::beginRun(edm::Hash<2> const&, unsigned int, bool&, bool&) () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#25 0x00002b933a5d6af8 in edm::EventProcessor::runToCompletion() () from /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/build/CMSSW_11_1_0_patch3-build/cc8_amd64_gcc8/cms/cmssw/CMSSW_11_1_0/lib/cc8_amd64_gcc8/libFWCoreFramework.so
#26 0x0000000000412d9b in main::{lambda()#1}::operator()() const ()
#27 0x0000000000411352 in main ()

Current Modules:

Module: L1FPGATrackProducer:TTTracksFromTrackletEmulation (crashed)

A fatal system signal has occurred: abort signal

https://cmssdt.cern.ch/SDT/jenkins-artifacts/auto-build-release/CMSSW_11_1_0_patch3-cc8_amd64_gcc8/2510/matrixTests/23234.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D49_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D49+RecoFullGlobal_2026D49+HARVESTFullGlobal_2026D49/step2_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D49_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D49+RecoFullGlobal_2026D49+HARVESTFullGlobal_2026D49.log

The origin of the error is https://github.com/cms-sw/cmssw/blob/master//L1Trigger/TrackFindingTracklet/plugins/L1FPGATrackProducer.cc#L310 -> https://github.com/cms-sw/cmssw/blob/master//L1Trigger/TrackFindingTracklet/src/TrackletEventProcessor.cc#L108 -> https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackFindingTracklet/src/Sector.cc#L93 -> https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackFindingTracklet/interface/Sector.h#L121

makortel commented 4 years ago

Has anyone tried to run valgrind yet?

tommasoboccali commented 4 years ago

I think @davidlange6 did, but it stopped before that ....

makortel commented 4 years ago

assign upgrade

cmsbuild commented 4 years ago

New categories assigned: upgrade

@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks

davidlange6 commented 4 years ago

from the patch1 version of code

==2888== Invalid write of size 8
==2888==    at 0x517E248C: std::_Rb_tree_header::_M_reset() (stl_tree.h:210)
==2888==    by 0x517E245F: std::_Rb_tree_header::_Rb_tree_header() (stl_tree.h:176)
==2888==    by 0x5183440B: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::_Select1st<std::pair<std::__cxx11::bas\
ic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<\
char> > const, int> > >::_Rb_tree_impl<std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>::_Rb_tree_impl() (stl_tree.h:692)
==2888==    by 0x518083A3: std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::_Select1st<std::pair<std::__cxx11::bas\
ic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<\
char> > const, int> > >::_Rb_tree() (stl_tree.h:936)
==2888==    by 0x518083BF: std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<c\
har, std::char_traits<char>, std::allocator<char> > const, int> > >::map() (stl_map.h:183)
==2888==    by 0x51807557: trklet::Globals::Globals(trklet::Settings const&) (Globals.cc:14)
==2888==    by 0x518DF0EA: std::_MakeUniq<trklet::Globals>::__single_object std::make_unique<trklet::Globals, trklet::Settings const&>(trklet::Settings const&) (unique_ptr.h:835)
==2888==    by 0x518D9084: trklet::TrackletEventProcessor::init(trklet::Settings const*) (TrackletEventProcessor.cc:28)
==2888==    by 0x515FD02A: L1FPGATrackProducer::beginRun(edm::Run const&, edm::EventSetup const&) (L1FPGATrackProducer.cc:310)
==2888==    by 0x4CA7A74: edm::one::EDProducerBase::doBeginRun(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x4C88DBF: edm::WorkerT<edm::one::EDProducerBase>::implDoBegin(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramewo\
rk.so)
==2888==    by 0x4BAA5E4: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::Ev\
entSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::Occu\
rrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) (in /cvmfs/cms.cern.\
ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==  Address 0x2339bbe90 is 0 bytes after a block of size 336 alloc'd
==2888==    at 0x402DDB2: operator new(unsigned long) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/valgrind/3.15.0-bcolbf2/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2888==    by 0x518DF0DC: std::_MakeUniq<trklet::Globals>::__single_object std::make_unique<trklet::Globals, trklet::Settings const&>(trklet::Settings const&) (unique_ptr.h:835)
==2888==    by 0x518D9084: trklet::TrackletEventProcessor::init(trklet::Settings const*) (TrackletEventProcessor.cc:28)
==2888==    by 0x515FD02A: L1FPGATrackProducer::beginRun(edm::Run const&, edm::EventSetup const&) (L1FPGATrackProducer.cc:310)
==2888==    by 0x4CA7A74: edm::one::EDProducerBase::doBeginRun(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x4C88DBF: edm::WorkerT<edm::one::EDProducerBase>::implDoBegin(edm::RunPrincipal const&, edm::EventSetupImpl const&, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramewo\
rk.so)
==2888==    by 0x4BAA5E4: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::Ev\
entSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::Occu\
rrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) (in /cvmfs/cms.cern.\
ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x4BAA808: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentCon\
text const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x4BAAAA2: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::RunPrincipal, (edm::Branc\
hActionType)0>::MyPrincipal const&, edm::EventSetupImpl const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0>::Context const*) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64\
_gcc820/libFWCoreFramework.so)
==2888==    by 0x4BAAF44: void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::B\
ranchActionType)0> >::execute()::{lambda()#1}&) (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x4BAAFF0: edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<\
edm::RunPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() (in /cvmfs/cms.cern.ch/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_1_0/lib/slc7_amd64_gcc820/libFWCoreFramework.so)
==2888==    by 0x66BC25C: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) (custom_scheduler.h:469)
==2888==
makortel commented 4 years ago

@tommasoboccali Where could I find the input file used by /afs/cern.ch/user/t/tboccali/public/pset_crash.py?

tommasoboccali commented 4 years ago

should be this, apologies (i had it in local to be faster, and then I forgot)

fileNames = cms.untracked.vstring('/store/relval/CMSSW_11_0_0/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU25ns_110X_mcRun4_realistic_v3_2026D49PU200-v2/10000/5FCE3C90-998F-F246-AD88-23C0F54CAA21.root'),

Dr15Jones commented 4 years ago

The most likely problem seems to be to be a probably one definition rule violation. The code has a compile time switch to change the layout of the class

https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackFindingTracklet/interface/Globals.h#L103-L106

If one piece of code has that define set and another has it unset, you'd get such a case where the constructor would write past the memory for which the allocator has assigned it. This is particularly true since the constructor is not defined in the header file and therefore is built in a separate .cc file from the file which is using the class.

davidlange6 commented 4 years ago

Does cmssw set “CMSSW_GIT_HASH” when compiling?

https://github.com/cms-sw/cmssw/blob/9010c72dccae06e78cb0ec045bc1c829cf1afd54/L1Trigger/TrackFindingTracklet/interface/Settings.h#L31

On Jul 14, 2020, at 4:05 PM, Chris Jones notifications@github.com wrote:

The most likely problem seems to be to be a probably one definition rule violation. The code has a compile time switch to change the layout of the class

https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackFindingTracklet/interface/Globals.h#L103-L106

If one piece of code has that define set and another has it unset, you'd get such a case where the constructor would write past the memory for which the allocator has assigned it. This is particularly true since the constructor is not defined in the header file and therefore is built in a separate .cc file from the file which is using the class.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/30675#issuecomment-658200299, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGPFQY54XPTNSUZVCEVP7LR3RQ4HANCNFSM4OZOEDRA.

Dr15Jones commented 4 years ago

It is a one definition rule violation. The CPP macro is set here

https://github.com/cms-sw/cmssw/blob/9010c72dccae06e78cb0ec045bc1c829cf1afd54/L1Trigger/TrackFindingTracklet/interface/Settings.h#L30-L32

That means Settings.h must be included before any use of Globals. In Globals.h that is not enformced since inclusion of Settings.h is behind that macro

https://github.com/cms-sw/cmssw/blob/9010c72dccae06e78cb0ec045bc1c829cf1afd54/L1Trigger/TrackFindingTracklet/interface/Settings.h#L30-L32

Settings.h is first in Globals.cc

https://github.com/cms-sw/cmssw/blob/master/L1Trigger/TrackFindingTracklet/src/Globals.cc#L2

But it is not even included in TrackletEventProcessor.cc where only a forward declaration exists.

Therefore Globals.cc sees a different layout for the class Globals than TrackletEventProcessor.cc does and it is the later which ask std::make_unique<Globals> to allocate space.

fwyzard commented 4 years ago

Indeed, if I add

#else
#error USEHYBRID is not defined

after the #ifdef I get plenty of errors :-(

fwyzard commented 4 years ago

minimal fix ?

diff --git a/L1Trigger/TrackFindingTracklet/BuildFile.xml b/L1Trigger/TrackFindingTracklet/BuildFile.xml
index 0d82ff2..93be87c 100644
--- a/L1Trigger/TrackFindingTracklet/BuildFile.xml
+++ b/L1Trigger/TrackFindingTracklet/BuildFile.xml
@@ -38,6 +38,7 @@
 <use name="TrackingTools/TrajectoryParametrization"/>
 <use name="TrackingTools/TrajectoryState"/>
 <use name="TrackingTools/TransientTrack"/>
+<flags CXXFLAGS="-DUSEHYBRID"/>

 <export>
   <lib   name="1"/>
diff --git a/L1Trigger/TrackFindingTracklet/interface/Settings.h b/L1Trigger/TrackFindingTracklet/interface/Settings.h
index e0509a2..6516fa7 100644
--- a/L1Trigger/TrackFindingTracklet/interface/Settings.h
+++ b/L1Trigger/TrackFindingTracklet/interface/Settings.h
@@ -28,7 +28,9 @@ namespace trklet {
     Settings() {
       //Comment out to run tracklet-only algorithm
 #ifdef CMSSW_GIT_HASH
-#define USEHYBRID
+#ifndef USEHYBRID
+#error USEHYBRID is not defined
+#endif
 #endif
     }

diff --git a/L1Trigger/TrackFindingTracklet/plugins/BuildFile.xml b/L1Trigger/TrackFindingTracklet/plugins/BuildFile.xml
index 866e2d6..14554c4 100644
--- a/L1Trigger/TrackFindingTracklet/plugins/BuildFile.xml
+++ b/L1Trigger/TrackFindingTracklet/plugins/BuildFile.xml
@@ -10,4 +10,5 @@
   <use name="SimDataFormats/GeneratorProducts"/>
   <use name="SimGeneral/HepPDTRecord"/>
   <flags EDM_PLUGIN="1"/>
+  <flags CXXFLAGS="-DUSEHYBRID"/>
 </library>
diff --git a/L1Trigger/TrackFindingTracklet/test/BuildFile.xml b/L1Trigger/TrackFindingTracklet/test/BuildFile.xml
index 8879ac8..aee5c63 100644
--- a/L1Trigger/TrackFindingTracklet/test/BuildFile.xml
+++ b/L1Trigger/TrackFindingTracklet/test/BuildFile.xml
@@ -2,6 +2,7 @@
   <library   file="*.cc" name="TrackFindingTrackletTests">
     <flags   EDM_PLUGIN="1"/>
     <flags   SKIP_FILES="fpga.cc"/>
+    <flags   CXXFLAGS="-DUSEHYBRID"/>
     <use   name="clhep"/>
     <use   name="root"/>
     <use   name="heppdt"/>
tommasoboccali commented 4 years ago

confirmed both by me (I reoved byhand the ifdefs in a consistent way) and @fwyzard (he forced the symbol).

Now it runs ... this does not mean results are ok.

we need input on this by @skinnari @tomalin @rekovic : is forcing Hybrid everywhere a solution????

thanks @Dr15Jones , (it would have taken me a coupe of geological eras to find ...)

fwyzard commented 4 years ago

https://github.com/cms-sw/cmssw/pull/30683 has this minimal fix, i.e. it just defined the flag in the BuildFiles to force it being defined in all compilation units. @tommasoboccali is testing a more aggressive fix, that removes the code that should have been #ifdef'ed away.

skinnari commented 4 years ago

sorry just returning from some days offline and seeing this thread (and PR 30683), it seems it is understood? please let me know otherwise if any input is required from my side...

fwyzard commented 4 years ago

@skinnari can you confirm that the intent was that USEHYBRID should always be defined ?

skinnari commented 4 years ago

@fwyzard yes. (if it is not, it compiles the "tracklet only" algorithm, instead of the "hybrid" algorithm).

tomalin commented 4 years ago

The fact that Globals.h checks if "#ifdef USEHYBRID" before it includes Settings.h does look like a bug, since it is Settings.h that defines USEHYBRID. (And temporarily adding "#define USEHYBRID" to the top of Globals.h fixes the crash, which confirms this).

I like your proposed solution of defining USEHYBRID in BuildFile.xml instead of Settings.h or Globals.h. Thanks for making the PR.

silviodonato commented 4 years ago

Fixed https://github.com/cms-sw/cmssw/pull/30683