cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

SIGSEGV in HGCalImagingAlgo present in RelVals for slc7_aarch64_gcc530 & slc7_aarch64_gcc700 (aarch64 only) #19179

Closed mrodozov closed 7 years ago

mrodozov commented 7 years ago

We were tracking release validation errors present only for aarch64 builds (here http://goo.gl/bhxlJE and here http://goo.gl/wPUz5C, fails 270 and 274 SIGSEGV) and found they've started after this PR https://github.com/cms-sw/cmssw/pull/18236. Before that, we ran manually the first test 27034.0 which failed with the following:

(gdb) where
#0  0x000003ff87439bc8 in _int_free () from /lib64/libc.so.6
#1  0x000003ff8907ebf4 in std::vector<double, std::allocator<double> >::~vector (this=<optimized out>, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_vector.h:425
#2  std::_Destroy<std::vector<double, std::allocator<double> > > (__pointer=<optimized out>) at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:93
#3 std::_Destroy_aux<false>::__destroy<std::vector<double, std::allocator<double> >*> (__first=0x5a13e108, __last=0x5a13e2a0)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:103
#4  0x000003ff8907ec20 in std::_Destroy<std::vector<double, std::allocator<double> >*> (__last=<optimized out>, __first=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:126
#5  std::_Destroy<std::vector<double, std::allocator<double> >*, std::vector<double, std::allocator<double> > > (__last=<optimized out>, __first=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:151
#6  std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::~vector (this=0x2f972db0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/stl_vector.h:424
#7  0x000003ff61095e00 in HGCalImagingAlgo::~HGCalImagingAlgo (this=0x2f972cc0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/src/RecoLocalCalo/HGCalRecAlgos/interface/HGCalImagingAlgo.h:123
#8  0x000003ff61095e94 in HGCalImagingAlgo::~HGCalImagingAlgo (this=0x2f972cc0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/src/RecoLocalCalo/HGCalRecAlgos/interface/HGCalImagingAlgo.h:124
#9  0x000003ff6109a274 in std::default_delete<HGCalImagingAlgo>::operator() (this=0x2f9725d8, __ptr=0x2f972cc0)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/unique_ptr.h:76
#10 0x000003ff610975cc in std::unique_ptr<HGCalImagingAlgo, std::default_delete<HGCalImagingAlgo> >::~unique_ptr (this=0x2f9725d8, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc530/external/gcc/5.3.0/include/c++/5.3.0/bits/unique_ptr.h:236
#11 0x000003ff61096354 in HGCalClusterProducer::~HGCalClusterProducer (this=0x2f972480, __in_chrg=<optimized out>)
    at /build/cmsbld/x/CMSSW_9_2_X_2017-06-06-2300/src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalClusterProducer.cc:35
#12 0x000003ff61096398 in HGCalClusterProducer::~HGCalClusterProducer (this=0x2f972480, __in_chrg=<optimized out>)
    at /build/cmsbld/x/CMSSW_9_2_X_2017-06-06-2300/src/RecoLocalCalo/HGCalRecProducers/plugins/HGCalClusterProducer.cc:35
#13 0x000003ff896cacec in edm::stream::ProducingModuleAdaptorBase<edm::stream::EDProducerBase>::~ProducingModuleAdaptorBase() ()
   from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#14 0x000003ff610912f8 in edm::stream::EDProducerAdaptorBase::~EDProducerAdaptorBase (this=0x2f96b5e0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/src/FWCore/Framework/interface/stream/EDProducerAdaptorBase.h:47
#15 0x000003ff610a2178 in edm::stream::ProducingModuleAdaptor<HGCalClusterProducer, edm::stream::EDProducerBase, edm::stream::EDProducerAdaptorBase>::~ProducingModuleAdaptor (this=0x2f96b5e0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/src/FWCore/Framework/interface/stream/ProducingModuleAdaptor.h:53
#16 0x000003ff610a21ac in edm::stream::ProducingModuleAdaptor<HGCalClusterProducer, edm::stream::EDProducerBase, edm::stream::EDProducerAdaptorBase>::~ProducingModuleAdaptor (this=0x2f96b5e0, __in_chrg=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/src/FWCore/Framework/interface/stream/ProducingModuleAdaptor.h:53
#17 0x000003ff717bbfc0 in std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::stream::EDProducerAdaptorBase>, std::allocator<edm::maker::ModuleHolderT<edm::stream::EDProducerAdaptorBase> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/pluginRecoBTagCombinedPlugins.so
#18 0x000003ff895c1250 in std::_Rb_tree<std::string, std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::string>, std::allocator<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) ()
   from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#19 0x000003ff895c117c in std::_Rb_tree<std::string, std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::string>, std::allocator<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) ()
   from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#20 0x000003ff895c12b8 in std::_Sp_counted_ptr<edm::ModuleRegistry*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
   from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#21 0x00000000004120c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#22 0x000003ff896623b8 in std::default_delete<edm::Schedule>::operator()(edm::Schedule*) const [clone .isra.814] ()
   from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#23 0x000003ff89669e28 in edm::EventProcessor::~EventProcessor() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#24 0x000003ff8966a27c in edm::EventProcessor::~EventProcessor() () from /cvmfs/cms-ib.cern.ch/week1/slc7_aarch64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc7_aarch64_gcc530/libFWCoreFramework.so
#25 0x000000000040d3a8 in (anonymous namespace)::EventProcessorWithSentry::~EventProcessorWithSentry() () 

This appears to start failing in the destructor of HGCalClusterProducer (which is empty), but
as we went further there was a reference showing something was wrong with the disposal of https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/HGCalRecProducers/plugins/HGCalClusterProducer.cc#L47 showing not proper deletion of a nested vector structure. @Dr15Jones @clelange

cmsbuild commented 7 years ago

A new Issue was created by @mrodozov .

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 7 years ago

assign reconstruction, upgrade

cmsbuild commented 7 years ago

New categories assigned: upgrade,reconstruction

@kpedro88,@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones commented 7 years ago

I'm running valgrind on the step 3 of 27034.0. The job isn't finished yet but it has already found

==808920== Invalid read of size 8
==808920==    at 0x6479DA60: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc
6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776093: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCa
lRecProducersPlugins.so)

==808920==  Address 0x13c7cc900 is 0 bytes after a block of size 6,352 alloc'd
==808920==    at 0x40271C6: operator new(unsigned long) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/external/valgrind/3.12.0-oenich/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==808920==    by 0x5E0F5D50: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >, unsigned long, std::vector<double, std::allocator<double> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginDQMOfflineMuon.so)
==808920==    by 0x6479D60A: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x6479DAEC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776060: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCalRecProducersPlugins.so)
mrodozov commented 7 years ago

if this helps, they've substituted makeClusters with populate here https://github.com/cms-sw/cmssw/pull/18236/files#diff-b09c179fedcd894f76956c03c04d943cR132. seems like the usage of populate inside the produce is where something goes wrong

slava77 commented 7 years ago

@rovere @felicepantaleo @clelange please follow up for this HGCal issue. Thank you.

@mrodozov please change the title of this issue to be more descriptive of the problem (e.g. "SIGSEGV in HGCalImagingAlgo"

slava77 commented 7 years ago

Also, for the record, we need some instructions to reproduce. I suspect that the shortened links to IBs will go dead in a week or so.

Dr15Jones commented 7 years ago

The valgrind log has

==808920== Invalid write of size 8
==808920==    at 0x6479D36F: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x6479DAEC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776060: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCalRecProducersPlugins.so)

==808920==  Address 0x13656fa20 is 0 bytes after a block of size 6,352 alloc'd
==808920==    at 0x40271C6: operator new(unsigned long) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/external/valgrind/3.12.0-oenich/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==808920==    by 0x5E0F5D50: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >, unsigned long, std::vector<double, std::allocator<double> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginDQMOfflineMuon.so)
==808920==    by 0x6479D60A: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x6479DAEC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776060: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCalRecProducersPlugins.so)

and

==808920== Invalid write of size 8
==808920==    at 0x6479D380: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x6479DAEC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776060: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCalRecProducersPlugins.so)

==808920==  Address 0x136a64e10 is 0 bytes after a block of size 6,352 alloc'd
==808920==    at 0x40271C6: operator new(unsigned long) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/external/valgrind/3.12.0-oenich/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==808920==    by 0x5E0F5D50: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >, unsigned long, std::vector<double, std::allocator<double> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginDQMOfflineMuon.so)
==808920==    by 0x6479D5EA: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x6479DAEC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/libRecoLocalCaloHGCalRecAlgos.so)
==808920==    by 0x64776060: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc530/cms/cmssw/CMSSW_9_2_X_2017-06-06-2300/lib/slc6_amd64_gcc530/pluginRecoLocalCaloHGCalRecProducersPlugins.so)
Dr15Jones commented 7 years ago

To reproduce, I created a work area for CMSSW_9_2_X_2017-06-06-2300 on a standard amd64 machine (slc6_amd64_gcc530) and then ran step 3 of workflow 27034.0. This does not crash, but valgrind does show problems.

Dr15Jones commented 7 years ago

As a first guess, I think the problem is probably line 571 and/or 572

https://github.com/cms-sw/cmssw/blob/298e2fbf7cac6264065305e6bdc530aeb77b78cf/RecoLocalCalo/HGCalRecAlgos/src/HGCalImagingAlgo.cc#L571

probably because of an off by one error in the wafer numbering.

Dr15Jones commented 7 years ago

As a test I added the following to HGCalImagingAlgo.cc

assert(layer > 0);
assert( (layer -1) < static_cast<long>(thresholds.size()));
assert(layer -1 < static_cast<long>(v_sigmaNoise.size()));
assert(wafer < static_cast<long>(thresholds[layer-1].size()));
assert(wafer < static_cast<long>(v_sigmaNoise[layer-1].size()));

I then ran the job and it failed with

cmsRun: /uscms_data/d2/cdj/build/temp/crash/CMSSW_9_2_X_2017-06-06-2300/src/RecoLocalCalo/HGCalRecAlgos/src/HGCalImagingAlgo.cc:567: void HGCalImagingAlgo::computeThreshold(): Assertion `wafer < static_cast(thresholds[layer-1].size())' failed.

So it does look like an off by one error with wafer

clelange commented 7 years ago

Hi @Dr15Jones @mrodozov - sorry about the hassle. If it's just a off-by-one error for wafer, then changing https://github.com/cms-sw/cmssw/blob/298e2fbf7cac6264065305e6bdc530aeb77b78cf/RecoLocalCalo/HGCalRecAlgos/src/HGCalImagingAlgo.cc#L547 to dummy.resize(maxNumberOfWafersPerLayer+1, 0); should fix it. If not, then the magic number in https://github.com/cms-sw/cmssw/blob/298e2fbf7cac6264065305e6bdc530aeb77b78cf/RecoLocalCalo/HGCalRecAlgos/interface/HGCalImagingAlgo.h#L172 is wrong and we need to ask @bsunanda if anything changed. I can have a look at that tomorrow, too many other things going on today.

kpedro88 commented 7 years ago

I ran valgrind on 27434.0 after compiling RecoLocalCalo/HGCalRecAlgos with debug symbols, and got this:

==3155149== Invalid write of size 8
==3155149==    at 0x64D3B32F: HGCalImagingAlgo::computeThreshold() (HGCalImagingAlgo.cc:571)
==3155149==    by 0x64D3BAAC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (HGCalImagingAlgo.cc:18)
==3155149==  Address 0xcda84350 is 0 bytes after a block of size 6,352 alloc'd
==3155149==    at 0x40271C6: operator new(unsigned long) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/valgrind/3.12.0-oenich/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3155149==    by 0x5E690970: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<st
==3155149==    by 0x64D3B5CA: insert (stl_vector.h:1054)
==3155149==    by 0x64D3B5CA: resize (stl_vector.h:696)
==3155149==    by 0x64D3B5CA: HGCalImagingAlgo::computeThreshold() (HGCalImagingAlgo.cc:548)
==3155149==    by 0x64D3BAAC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (HGCalImagingAlgo.cc:18)

==3155149== Invalid write of size 8
==3155149==    at 0x64D3B340: HGCalImagingAlgo::computeThreshold() (HGCalImagingAlgo.cc:572)
==3155149==    by 0x64D3BAAC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (HGCalImagingAlgo.cc:18)
==3155149==  Address 0xcffc30d0 is 0 bytes after a block of size 6,352 alloc'd
==3155149==    at 0x40271C6: operator new(unsigned long) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/valgrind/3.12.0-oenich/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3155149==    by 0x5E690970: std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<st
==3155149==    by 0x64D3B5AA: insert (stl_vector.h:1054)
==3155149==    by 0x64D3B5AA: resize (stl_vector.h:696)
==3155149==    by 0x64D3B5AA: HGCalImagingAlgo::computeThreshold() (HGCalImagingAlgo.cc:549)
==3155149==    by 0x64D3BAAC: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (HGCalImagingAlgo.cc:18)

So it looks like @Dr15Jones had the right idea.

smuzaffar commented 7 years ago

@clelange , using dummy.resize(maxNumberOfWafersPerLayer+1, 0) did not work. It still fails with same core dump.

smuzaffar commented 7 years ago

@slava77, the crash is only visible on aarch64. In order to reproduce it you need to login to one of our arm64 build machines then create 92X dev area and run workflow 27034.0. I have created cmsuser account on moonshot-arm64-13.cern.ch (I can send you the password in email).

kpedro88 commented 7 years ago

@clelange I think you're correct that we need to ask @bsunanda for the correct "magic number" maxNumberOfWafersPerLayer*. I replaced the vector index operator []s with .at()s, so it would throw an out of range exception that can be caught by gdb. When I initialize the vectors to a size of maxNumberOfWafersPerLayer+1, the exception still gets thrown with wafer = maxNumberOfWafersPerLayer + 1 = 795.

* It would be great to be able to get this number directly from the HGCal geometry/topology in a way that would enforce its correctness...

davidlt commented 7 years ago

I looked at 27034.0 CMSSW_9_2_ROOT6_X_2017-06-08-2300 slc6_amd64_gcc700

==30338== Invalid read of size 8
==30338== Invalid read of size 8
==30338== Invalid write of size 8
==30338== Invalid write of size 8
==30338== Invalid read of size 8
==30338==    at 0x717432D: __dynamic_cast (dyncast.cc:50)
==30338==    by 0x1B1EA38E: TMVA::Reader::~Reader() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/lcg/root/6.09.04-opkfni/lib/libTMVA.so)
==30338==    by 0x60E49B75: PhotonMVAEstimatorRun2Spring16NonTrig::createSingleReader(int, edm::FileInPath const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentif
icationPlugins.so)
==30338==    by 0x60E49FC6: PhotonMVAEstimatorRun2Spring16NonTrig::PhotonMVAEstimatorRun2Spring16NonTrig(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEga
mmaPhotonIdentificationPlugins.so)
==30338==    by 0x60E4A3B0: edmplugin::PluginFactory<AnyMVAEstimatorRun2Base* (edm::ParameterSet const&)>::PMaker<PhotonMVAEstimatorRun2Spring16NonTrig>::create(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9
_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==30338==    by 0x2CFAE740: egamma::MVAObjectCache::MVAObjectCache(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoEgammaEgammaTools.so)
==30338==    by 0x60E446CE: edm::WorkerMaker<MVAValueMapProducer<reco::Photon> >::makeModule(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPho
tonIdentificationPlugins.so)
==30338==    by 0x4BABFA6: edm::Maker::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cm
s/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B3DD06: edm::Factory::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/
cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B52D8C: edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::string const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) (in /cvmfs/cms-ib.cern.ch/nweek-0247
5/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4C08624: edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4BC7605: edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cm
ssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)

==30338== Invalid read of size 8
==30338==    at 0x7174363: __dynamic_cast (dyncast.cc:68)
==30338==    by 0x1B1EA38E: TMVA::Reader::~Reader() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/lcg/root/6.09.04-opkfni/lib/libTMVA.so)
==30338==    by 0x60E49B75: PhotonMVAEstimatorRun2Spring16NonTrig::createSingleReader(int, edm::FileInPath const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==30338==    by 0x60E49FC6: PhotonMVAEstimatorRun2Spring16NonTrig::PhotonMVAEstimatorRun2Spring16NonTrig(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==30338==    by 0x60E4A3B0: edmplugin::PluginFactory<AnyMVAEstimatorRun2Base* (edm::ParameterSet const&)>::PMaker<PhotonMVAEstimatorRun2Spring16NonTrig>::create(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==30338==    by 0x2CFAE740: egamma::MVAObjectCache::MVAObjectCache(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoEgammaEgammaTools.so)
==30338==    by 0x60E446CE: edm::WorkerMaker<MVAValueMapProducer<reco::Photon> >::makeModule(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==30338==    by 0x4BABFA6: edm::Maker::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B3DD06: edm::Factory::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B52D8C: edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::string const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4C08624: edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4BC7605: edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)

==30338== Invalid write of size 8
==30338==    at 0x60B454CA: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoLocalCaloHGCalRecAlgos.so)
==30338==    by 0x60B45E9C: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoLocalCaloHGCalRecAlgos.so)
==30338==    by 0x60B1AD0D: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoLocalCaloHGCalRecProducersPlugins.so)
==30338==    by 0x4C5C202: edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B90021: edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B356C6: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B3587C: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B37265: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B37730: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x64508AD: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==30338==    by 0x4BECA03: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B406EC: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)

==30338== Invalid write of size 8
==30338==    at 0x60B454DA: HGCalImagingAlgo::computeThreshold() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoLocalCaloHGCalRecAlgos.so)
==30338==    by 0x60B45E9C: HGCalImagingAlgo::populate(edm::SortedCollection<HGCRecHit, edm::StrictWeakOrdering<HGCRecHit> > const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libRecoLocalCaloHGCalRecAlgos.so)
==30338==    by 0x60B1AD0D: HGCalClusterProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/pluginRecoLocalCaloHGCalRecProducersPlugins.so)
==30338==    by 0x4C5C202: edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B90021: edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B356C6: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B3587C: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B37265: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B37730: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x64508AD: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==30338==    by 0x4BECA03: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)
==30338==    by 0x4B406EC: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc6_amd64_gcc700/cms/cmssw-patch/CMSSW_9_2_ROOT6_X_2017-06-08-2300/lib/slc6_amd64_gcc700/libFWCoreFramework.so)

Same workflow on AArch64 for CMSSW_9_2_ROOT6_X_2017-06-05-2300 produces 40 invalid writes/reads. Some are here:

==19069== Invalid read of size 8
==19069==    at 0x6DDF0FC: __dynamic_cast (dyncast.cc:50)
==19069==    by 0x1A2CB053: TMVA::Reader::~Reader() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/lcg/root/6.09.04-opkfni/lib/libTMVA.so)
==19069==    by 0x619562EF: PhotonMVAEstimatorRun2Spring16NonTrig::createSingleReader(int, edm::FileInPath const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x61956757: PhotonMVAEstimatorRun2Spring16NonTrig::PhotonMVAEstimatorRun2Spring16NonTrig(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x61956B9B: edmplugin::PluginFactory<AnyMVAEstimatorRun2Base* (edm::ParameterSet const&)>::PMaker<PhotonMVAEstimatorRun2Spring16NonTrig>::create(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x2BAE79AB: egamma::MVAObjectCache::MVAObjectCache(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaTools.so)
==19069==    by 0x6195187F: edm::WorkerMaker<MVAValueMapProducer<reco::Photon> >::makeModule(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x4A0F20F: edm::Maker::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49BE85F: edm::Factory::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4AA560B: edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::string const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A6783B: edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A2B447: edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)

==19069== Invalid read of size 8
==19069==    at 0x6DDF114: __dynamic_cast (dyncast.cc:68)
==19069==    by 0x1A2CB053: TMVA::Reader::~Reader() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/lcg/root/6.09.04-opkfni/lib/libTMVA.so)
==19069==    by 0x619562EF: PhotonMVAEstimatorRun2Spring16NonTrig::createSingleReader(int, edm::FileInPath const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x61956757: PhotonMVAEstimatorRun2Spring16NonTrig::PhotonMVAEstimatorRun2Spring16NonTrig(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x61956B9B: edmplugin::PluginFactory<AnyMVAEstimatorRun2Base* (edm::ParameterSet const&)>::PMaker<PhotonMVAEstimatorRun2Spring16NonTrig>::create(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x2BAE79AB: egamma::MVAObjectCache::MVAObjectCache(edm::ParameterSet const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaTools.so)
==19069==    by 0x6195187F: edm::WorkerMaker<MVAValueMapProducer<reco::Photon> >::makeModule(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaPhotonIdentificationPlugins.so)
==19069==    by 0x4A0F20F: edm::Maker::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49BE85F: edm::Factory::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4AA560B: edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::string const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A6783B: edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A2B447: edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::string const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)

==19069== Invalid write of size 8
==19069==    at 0x5A95CC90: ROOT::Math::SVector<double, 6u>& ROOT::Math::SVector<double, 6u>::operator=<ROOT::Math::VectorMatrixRowOp<ROOT::Math::SMatrix<double, 6u, 6u, ROOT::Math::MatRepSym<double, 6u> >, ROOT::Math::SVector<double, 6u>, 6u> >(ROOT::Math::VecExpr<ROOT::Math::VectorMatrixRowOp<ROOT::Math::SMatrix<double, 6u, 6u, ROOT::Math::MatRepSym<double, 6u> >, ROOT::Math::SVector<double, 6u>, 6u>, double, 6u> const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaPhotonAlgos.so)
==19069==    by 0x5A96188B: KinematicConstrainedVertexUpdatorT<2, 2>::update(ROOT::Math::SVector<double, 17u> const&, ROOT::Math::SMatrix<double, 17u, 17u, ROOT::Math::MatRepSym<double, 17u> >&, std::vector<KinematicState, std::allocator<KinematicState> >&, Point3DBase<float, GlobalTag> const&, Vector3DBase<float, GlobalTag> const&, MultiTrackKinematicConstraintT<2, 2>*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaPhotonAlgos.so)
==19069==    by 0x5A96397F: KinematicConstrainedVertexFitterT<2, 2>::fit(std::vector<ReferenceCountingPointer<KinematicParticle>, std::allocator<ReferenceCountingPointer<KinematicParticle> > > const&, MultiTrackKinematicConstraintT<2, 2>*, Point3DBase<float, GlobalTag>*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaPhotonAlgos.so)
==19069==    by 0x5A959FAB: ConversionVertexFinder::run(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, reco::Vertex&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libRecoEgammaEgammaPhotonAlgos.so)
==19069==    by 0x61DF460B: ConversionProducer::checkVertex(reco::TransientTrack const&, reco::TransientTrack const&, MagneticField const*, reco::Vertex&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaEgammaPhotonProducers.so)
==19069==    by 0x61DF6A37: ConversionProducer::buildCollection(edm::Event&, edm::EventSetup const&, std::multimap<float, edm::Ptr<reco::ConversionTrack>, std::less<float>, std::allocator<std::pair<float const, edm::Ptr<reco::ConversionTrack> > > > const&, std::multimap<double, edm::Ptr<reco::CaloCluster>, std::less<double>, std::allocator<std::pair<double const, edm::Ptr<reco::CaloCluster> > > > const&, std::multimap<double, edm::Ptr<reco::CaloCluster>, std::less<double>, std::allocator<std::pair<double const, edm::Ptr<reco::CaloCluster> > > > const&, reco::Vertex const&, std::vector<reco::Conversion, std::allocator<reco::Conversion> >&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaEgammaPhotonProducers.so)
==19069==    by 0x61DF8DE7: ConversionProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/pluginRecoEgammaEgammaPhotonProducers.so)
==19069==    by 0x4ABA123: edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A9F063: edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498B8F7: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BAE3: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BEBB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==  Address 0x1fff000ee0 is on thread 1's stack
==19069==  64 bytes below stack pointer
==19069==
==19069== Invalid write of size 8
==19069==    at 0x7DE2006C: ???
==19069==  Address 0x1fff002d10 is on thread 1's stack
==19069==  32 bytes below stack pointer
==19069==
==19069== Invalid write of size 8
==19069==    at 0x76C400E0: ???
==19069==  Address 0x1fff002d10 is on thread 1's stack
==19069==  32 bytes below stack pointer
==19069==
==19069== Invalid write of size 8
==19069==    at 0x7DE0006C: ???
==19069==  Address 0x1fff002d10 is on thread 1's stack
==19069==  32 bytes below stack pointer
==19069==

// Maybe stack was damaged here

==19069== Invalid write of size 8
==19069==    at 0x5BA9D5F0: HcalSimHitStudy::analyzeHits(std::vector<PCaloHit, std::allocator<PCaloHit> >&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x5BA9E45B: HcalSimHitStudy::analyze(edm::Event const&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x4ABDCEB: edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-
06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A9F333: edm::WorkerT<edm::stream::EDAnalyzerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/
lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498B8F7: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::E
ventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::Occurre
nceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BAE3: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BEBB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498C3DB: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreF
ramework.so)
==19069==    by 0x67AF6B7: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==19069==    by 0x4A541C3: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49A760F: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49AA243: statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)

==19069== Invalid write of size 8
==19069==    at 0x5BA9D684: HcalSimHitStudy::analyzeHits(std::vector<PCaloHit, std::allocator<PCaloHit> >&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x5BA9E45B: HcalSimHitStudy::analyze(edm::Event const&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x4ABDCEB: edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A9F333: edm::WorkerT<edm::stream::EDAnalyzerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498B8F7: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BAE3: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BEBB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498C3DB: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x67AF6B7: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==19069==    by 0x4A541C3: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49A760F: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49AA243: statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)

// HcalSimHitStudy::analyzeHits repeats multiple times (for reads and writes)

==19069== Invalid read of size 8
==19069==    at 0x5BA9DB2C: HcalSimHitStudy::analyzeHits(std::vector<PCaloHit, std::allocator<PCaloHit> >&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x5BA9E45B: HcalSimHitStudy::analyze(edm::Event const&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/biglib/slc7_aarch64_gcc700/pluginSimulation.so)
==19069==    by 0x4ABDCEB: edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x4A9F333: edm::WorkerT<edm::stream::EDAnalyzerAdaptorBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498B8F7: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BAE3: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498BEBB: void edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x498C3DB: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x67AF6B7: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:501)
==19069==    by 0x4A541C3: edm::EventProcessor::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49A760F: statemachine::HandleEvent::readAndProcessEvent() (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
==19069==    by 0x49AA243: statemachine::HandleEvent::HandleEvent(boost::statechart::state<statemachine::HandleEvent, statemachine::HandleLumis, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context) (in /cvmfs/cms-ib.cern.ch/nweek-02475/slc7_aarch64_gcc700/cms/cmssw/CMSSW_9_2_ROOT6_X_2017-06-05-2300/lib/slc7_aarch64_gcc700/libFWCoreFramework.so)
bsunanda commented 7 years ago

There are wafer #'s 0..795 which are present for FH. So setting the maximum to 796 should help. I am trying to get a helper function in geometry which can provide maximum wafer # for a given configuration.

bsunanda commented 7 years ago

Submitted a PR with hardwired number soon to be replaced by number derived from geometry

kpedro88 commented 7 years ago

I confirm that #19198 from @bsunanda does not have any out-of-range exceptions when I run workflow 27434.0 replacing []s with .at()s.

rovere commented 7 years ago

Good, maybe we should move back to use op [] that is infinitely faster.

On Mon, Jun 12, 2017, 23:27 Kevin Pedro notifications@github.com wrote:

I confirm that #19198 https://github.com/cms-sw/cmssw/pull/19198 from @bsunanda https://github.com/bsunanda does not have any out-of-range exceptions when I run workflow 27434.0 replacing []s with .at()s.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cms-sw/cmssw/issues/19179#issuecomment-307931641, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHaR-CEitAskwvaZVSXOtFUWHkupUktks5sDaqhgaJpZM4N1lz3 .

-- Ciao, --Marco.


Marco Rovere Marco.Rovere@cern.ch CERN EP-CMG-CO | room 40 3-A28 | tel +41227671209 (71209)

smuzaffar commented 7 years ago

I confirm that 4 failing workflows on aarch64 run without crash with #19198

27034.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D16_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D16+RecoFullGlobal_2023D16+HARVESTFullGlobal_2023D16 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Tue Jun 13 08:51:06 2017-date Tue Jun 13 08:03:32 2017; exit: 0 0 0 0
27034.2_TTbar_14TeV_Timing+TTbar_14TeV_TuneCUETP8M1_2023D16_GenSimHLBeamSpotFull14_Timing+DigiFullTrigger_Timing_2023D16+RecoFullGlobal_Timing_2023D16+HARVESTFullGlobal_Timing_2023D16 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Tue Jun 13 08:51:06 2017-date Tue Jun 13 08:03:37 2017; exit: 0 0 0 0
27434.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D17_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D17+RecoFullGlobal_2023D17+HARVESTFullGlobal_2023D17 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Tue Jun 13 08:51:05 2017-date Tue Jun 13 08:03:40 2017; exit: 0 0 0 0
3 3 3 3 tests passed, 0 0 0 0 failed
davidlt commented 7 years ago

But does that fix all the issues from valgrind?

On Tue, Jun 13, 2017, 9:02 AM Malik Shahzad Muzaffar < notifications@github.com> wrote:

I confirm that 4 failing workflows on aarch64 run without crash with

19198 https://github.com/cms-sw/cmssw/pull/19198

27034.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D16_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D16+RecoFullGlobal_2023D16+HARVESTFullGlobal_2023D16 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Tue Jun 13 08:51:06 2017-date Tue Jun 13 08:03:32 2017; exit: 0 0 0 0 27034.2_TTbar_14TeV_Timing+TTbar_14TeV_TuneCUETP8M1_2023D16_GenSimHLBeamSpotFull14_Timing+DigiFullTrigger_Timing_2023D16+RecoFullGlobal_Timing_2023D16+HARVESTFullGlobal_Timing_2023D16 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Tue Jun 13 08:51:06 2017-date Tue Jun 13 08:03:37 2017; exit: 0 0 0 0 27434.0_TTbar_14TeV+TTbar_14TeV_TuneCUETP8M1_2023D17_GenSimHLBeamSpotFull14+DigiFullTrigger_2023D17+RecoFullGlobal_2023D17+HARVESTFullGlobal_2023D17 Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED - time date Tue Jun 13 08:51:05 2017-date Tue Jun 13 08:03:40 2017; exit: 0 0 0 0 3 3 3 3 tests passed, 0 0 0 0 failed

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cms-sw/cmssw/issues/19179#issuecomment-308026120, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMB47nb7Ho7kw6hqhTUlSPFrmDJsSKYks5sDjQBgaJpZM4N1lz3 .

kpedro88 commented 7 years ago

@rovere the official code still uses []s - the change to .at()s was just in my work area.

@davidlt if it solves the observed crash, I think it satisfies this issue - another issue could be opened for the other potential problems you found.

smuzaffar commented 7 years ago

PR https://github.com/cms-sw/cmssw/pull/19207 should fix the TMVA::Reader invalid read issue we have seen here https://github.com/cms-sw/cmssw/issues/19179#issuecomment-307687417

kpedro88 commented 7 years ago

+1 Resolved by #19198 as previously stated Other valgrind issues that haven't been causing segfaults should get their own issue(s)

slava77 commented 7 years ago

+1

fixed in #19198 as noted above already

cmsbuild commented 7 years ago

This issue is fully signed and ready to be closed.