cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

[ASAN] Unit Test `testTriggerEventAnalyzers` failing with heap-buffer-overflow #41045

Closed aandvalenzuela closed 1 year ago

aandvalenzuela commented 1 year ago

Hello,

Unit Test testTriggerEventAnalyzers (from module HLTrigger/HLTcore) is failing with heap-buffer-overflow. Here the summary:

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xf1638) in void HLTEventAnalyzerRAW::showObjects<std::vector<int, std::allocator<int> >, std::vector<edm::Ref<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet,
std::allocator<reco::PFJet> >, reco::PFJet> > > > >(std::vector<int, std::allocator<int> > const&, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet,
edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >,
std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::P

Find the full log in this link.

Thanks!

FYI, @missirol (I see you added this test to CMSSW)

cmsbuild commented 1 year ago

A new Issue was created by @aandvalenzuela Andrea Valenzuela.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

aandvalenzuela commented 1 year ago

assign hlt

cmsbuild commented 1 year ago

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

Here is the full ASAN report from the log

==17415==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619001ff7640 at pc 0x2af7c6fed639 bp 0x7ffcdc01fed0 sp 0x7ffcdc01fec8
READ of size 8 at 0x619001ff7640 thread T0
    #0 0x2af7c6fed638 in void HLTEventAnalyzerRAW::showObjects<std::vector<int, std::allocator<int> >, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > > > >(std::vector<int, std::allocator<int> > const&, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xf1638)
    #1 0x2af7c6fc9af8 in HLTEventAnalyzerRAW::analyzeTrigger(edm::Event const&, edm::EventSetup const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xcdaf8)
    #2 0x2af7c6fc332e in HLTEventAnalyzerRAW::analyze(edm::Event const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xc732e)
    #3 0x2af7854a8243 in edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x7a5243)
    #4 0x2af785443150 in edm::WorkerT<edm::stream::EDAnalyzerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x740150)
    #5 0x2af7851e795c in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x4e495c)
    #6 0x2af78520a06a in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x50706a)
    #7 0x2af785ebd6cb in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreConcurrency.so+0x96cb)
    #8 0x2af7875259cc in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
    #9 0x2af7875259cc in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
    #10 0x2af7875259cc in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.cpp:168
    #11 0x2af784f4d139 in edm::FinalWaitingTask::wait() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x24a139)
    #12 0x2af784fad987 in edm::EventProcessor::processRuns() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x2aa987)
    #13 0x2af784f8ba0d in edm::EventProcessor::runToCompletion() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x288a0d)
    #14 0x4193a1 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/bin/el8_amd64_gcc11/cmsRun+0x4193a1)
    #15 0x2af787513846 in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/arena.cpp:694
    #16 0x42e3f8 in main::{lambda()#1}::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/bin/el8_amd64_gcc11/cmsRun+0x42e3f8)
    #17 0x40be95 in main (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/bin/el8_amd64_gcc11/cmsRun+0x40be95)
    #18 0x2af78841dd84 in __libc_start_main (/lib64/libc.so.6+0x3ad84)
    #19 0x40c57d in _start (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/bin/el8_amd64_gcc11/cmsRun+0x40c57d)

0x619001ff7640 is located 32 bytes to the right of 928-byte region [0x619001ff7280,0x619001ff7620)
allocated by thread T0 here:
    #0 0x2af7843e1f37 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x2af7b7cf3c88 in ROOT::Detail::TCollectionProxyInfo::Pushback<std::vector<reco::PFJet, std::allocator<reco::PFJet> > >::resize(void*, unsigned long) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/libDataFormatsJetReco.so+0x213c88)
    #2 0x2af7866c3824 in TGenCollectionProxy::Allocate(unsigned int, bool) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/external/el8_amd64_gcc11/lib/libRIO.so+0x17d824)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02775/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-10-2300/lib/el8_amd64_gcc11/pluginHLTriggerHLTcorePlugins.so+0xf1638) in void HLTEventAnalyzerRAW::showObjects<std::vector<int, std::allocator<int> >, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > > > >(std::vector<int, std::allocator<int> > const&, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::P
Shadow bytes around the buggy address:
  0x0c32803f6e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32803f6e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32803f6e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32803f6ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32803f6eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c32803f6ec0: 00 00 00 00 fa fa fa fa[fa]fa fa fa fa fa fa fa
  0x0c32803f6ed0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32803f6ee0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32803f6ef0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32803f6f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32803f6f10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==17415==ABORTING
iarspider commented 1 year ago

Full ASAN report with line numbers:

=================================================================
==24190==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619002a0c540 at pc 0x7fd8d7dbf639 bp 0x7ffed4d12fd0 sp 0x7ffed4d12fc8
READ of size 8 at 0x619002a0c540 thread T0
    #0 0x7fd8d7dbf638 in ROOT::Math::PtEtaPhiM4D<double>::Pt() const /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/lcg/root/6.26.11-352ff7105472dc3a16fb5cfd35093c4a/include/Math/GenVector/PtEtaPhiM4D.h:138
    #1 0x7fd8d7dbf638 in ROOT::Math::LorentzVector<ROOT::Math::PtEtaPhiM4D<double> >::pt() const /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/lcg/root/6.26.11-352ff7105472dc3a16fb5cfd35093c4a/include/Math/GenVector/LorentzVector.h:639
    #2 0x7fd8d7dbf638 in reco::ParticleState::pt() const /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/DataFormats/Candidate/interface/ParticleState.h:137
    #3 0x7fd8d7dbf638 in reco::LeafCandidate::pt() const /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/DataFormats/Candidate/interface/LeafCandidate.h:146
    #4 0x7fd8d7dbf638 in void HLTEventAnalyzerRAW::showObject<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > >(edm::Log<edm::level::Info, true>&, edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > const&) const /build/razumov/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/HLTrigger/HLTcore/interface/HLTEventAnalyzerRAW.h:160
    #5 0x7fd8d7dbf638 in void HLTEventAnalyzerRAW::showObjects<std::vector<int, std::allocator<int> >, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > > > >(std::vector<int, std::allocator<int> > const&, std::vector<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> >, std::allocator<edm::Ref<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet, edm::refhelper::FindUsingAdvance<std::vector<reco::PFJet, std::allocator<reco::PFJet> >, reco::PFJet> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const /build/razumov/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/HLTrigger/HLTcore/interface/HLTEventAnalyzerRAW.h:153
    #6 0x7fd8d7d9baf8 in HLTEventAnalyzerRAW::analyzeTrigger(edm::Event const&, edm::EventSetup const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /build/razumov/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/HLTrigger/HLTcore/plugins/HLTEventAnalyzerRAW.cc:371
    #7 0x7fd8d7d9532e in HLTEventAnalyzerRAW::analyze(edm::Event const&, edm::EventSetup const&) /build/razumov/CMSSW_13_1_ASAN_X_2023-03-15-2300/src/HLTrigger/HLTcore/plugins/HLTEventAnalyzerRAW.cc:137
    #8 0x7fd91b9dde53 in edm::stream::EDAnalyzerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x7a4e53)
    #9 0x7fd91b978d60 in edm::WorkerT<edm::stream::EDAnalyzerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x73fd60)
    #10 0x7fd91b71d89c in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x4e489c)
    #11 0x7fd91b73ffaa in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x506faa)
    #12 0x7fd91aac86cb in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreConcurrency.so+0x96cb)
    #13 0x7fd91946d9cc in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
    #14 0x7fd91946d9cc in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
    #15 0x7fd91946d9cc in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/task_dispatcher.cpp:168
    #16 0x7fd91b482969 in edm::FinalWaitingTask::wait() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x249969)
    #17 0x7fd91b4e3137 in edm::EventProcessor::processRuns() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x2aa137)
    #18 0x7fd91b4c11bd in edm::EventProcessor::runToCompletion() (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so+0x2881bd)
    #19 0x4193a1 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/bin/el8_amd64_gcc11/cmsRun+0x4193a1)
    #20 0x7fd91945b846 in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-d98ea1c603fa3368d06780eab09b1de9/tbb-v2021.8.0/src/tbb/arena.cpp:694
    #21 0x42e3f8 in main::{lambda()#1}::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/bin/el8_amd64_gcc11/cmsRun+0x42e3f8)
    #22 0x40be95 in main (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/bin/el8_amd64_gcc11/cmsRun+0x40be95)
    #23 0x7fd918225d84 in __libc_start_main (/lib64/libc.so.6+0x3ad84)
    #24 0x40c57d in _start (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/bin/el8_amd64_gcc11/cmsRun+0x40c57d)

0x619002a0c540 is located 32 bytes to the right of 928-byte region [0x619002a0c180,0x619002a0c520)
allocated by thread T0 here:
    #0 0x7fd91bd40f37 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7fd8e8788c88 in ROOT::Detail::TCollectionProxyInfo::Pushback<std::vector<reco::PFJet, std::allocator<reco::PFJet> > >::resize(void*, unsigned long) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/lib/el8_amd64_gcc11/libDataFormatsJetReco.so+0x213c88)
    #2 0x7fd91a1f6824 in TGenCollectionProxy::Allocate(unsigned int, bool) (/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_ASAN_X_2023-03-15-2300/external/el8_amd64_gcc11/lib/libRIO.so+0x17d824)

SUMMARY: AddressSanitizer: heap-buffer-overflow /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02776/el8_amd64_gcc11/lcg/root/6.26.11-352ff7105472dc3a16fb5cfd35093c4a/include/Math/GenVector/PtEtaPhiM4D.h:138 in ROOT::Math::PtEtaPhiM4D<double>::Pt() const
Shadow bytes around the buggy address:
  0x0c3280539850: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3280539860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3280539870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3280539880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c3280539890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c32805398a0: 00 00 00 00 fa fa fa fa[fa]fa fa fa fa fa fa fa
  0x0c32805398b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32805398c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c32805398d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32805398e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c32805398f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==24190==ABORTING
iarspider commented 1 year ago

The unit test log also contains the following error message, which could be related to the error reported by ASAN:

%MSG-e HLTPrescaleProvider:  HLTEventAnalyzerAOD:triggerEventAnalyzer  17-Mar-2023 14:56:56 CET Run: 1 Event: 702
 Error in determining L1T prescale for HLT path: 'HLT_AK8PFHT750_TrimMass50_v13' with L1T seed: 'L1_HTT120er OR L1_HTT160er OR L1_HTT200er OR L1_HTT255er OR L1_HTT280er OR L1_HTT280er_QuadJet_70_55_40_35_er2p5 OR L1_HTT320er_QuadJet_80_60_er2p1_45_40_er2p3 OR L1_HTT320er_QuadJet_80_60_er2p1_50_45_er2p3 OR L1_HTT320er OR L1_HTT360er OR L1_ETT2000 OR L1_HTT400er OR L1_HTT450er' using L1TGlobalUtil: error cond = 1. Note: for this method ('prescaleValues'), only a single L1T name (and not a bit number) is allowed as seed! For seeds being complex logical expressions, try the new method 'prescaleValuesInDetail'.
%MSG
%MSG-e HLTPrescaleProvider:  HLTEventAnalyzerAOD:triggerEventAnalyzer  17-Mar-2023 14:56:56 CET Run: 1 Event: 702
 Error in determining L1T prescale for HLT path: 'HLT_AK8PFHT800_TrimMass50_v13' with L1T seed: 'L1_HTT120er OR L1_HTT160er OR L1_HTT200er OR L1_HTT255er OR L1_HTT280er OR L1_HTT280er_QuadJet_70_55_40_35_er2p5 OR L1_HTT320er_QuadJet_80_60_er2p1_45_40_er2p3 OR L1_HTT320er_QuadJet_80_60_er2p1_50_45_er2p3 OR L1_HTT320er OR L1_HTT360er OR L1_ETT2000 OR L1_HTT400er OR L1_HTT450er' using L1TGlobalUtil: error cond = 1. Note: for this method ('prescaleValues'), only a single L1T name (and not a bit number) is allowed as seed! For seeds being complex logical expressions, try the new method 'prescaleValuesInDetail'.
missirol commented 1 year ago

(Couldn't get to this issue yet, but I plan to have a look in the next days.)

missirol commented 1 year ago

I started to have a look. The following is just my current understanding (might be incorrect/incomplete).

If you think I'm missing something, please let me know.

[1] https://github.com/cms-sw/cmssw/blob/357677c1b73631055bf2c3afdbf0f5dd2885426e/HLTrigger/btau/plugins/HLTJetTag.cc#L109-L110

[2]

diff --git a/HLTrigger/btau/plugins/HLTJetTag.cc b/HLTrigger/btau/plugins/HLTJetTag.cc
index 61f15b886dc..04c9f1fb8fc 100644
--- a/HLTrigger/btau/plugins/HLTJetTag.cc
+++ b/HLTrigger/btau/plugins/HLTJetTag.cc
@@ -107,6 +107,7 @@ bool HLTJetTag<T>::hltFilter(edm::Event& event,
   int nJet = 0;
   int nTag = 0;
   for (auto const& jet : *h_JetTags) {
+if (jet.first.key() >= h_Jets->size()) throw cms::Exception("OutOfRange") << jet.first.key() << " >= " << h_Jets->size();
     jetRef = TRef(h_Jets, jet.first.key());
     LogTrace("") << "Jet " << nJet << " : Et = " << jet.first->et() << " , tag value = " << jet.second;
     ++nJet;
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 28-Mar-2023 14:14:26.763 CEST
----- Begin Fatal Exception 28-Mar-2023 14:14:29 CEST-----------------------
An exception of category 'OutOfRange' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 5 stream: 0
   [1] Running path 'HLT_DoublePFJets116MaxDeta1p6_DoublePFBTagDeepJet_p71_v3'
   [2] Calling method for module HLTPFJetTag/'hltBTagPFDeepJet0p71Double6Jets80'
Exception Message:
5 >= 5
----- End Fatal Exception -------------------------------------------------
makortel commented 1 year ago

It seems to me the HLTJetTag<T> assumes the std::vector<T> is the "key" of JetTagCollection (which is edm::AssociationVector<edm::RefToBaseProd<reco::Jet>, std::vector<float>>). Or, otherwise the line construction of edm::Ref<std::vector<T>> from the AssociationVector element in https://github.com/cms-sw/cmssw/blob/357677c1b73631055bf2c3afdbf0f5dd2885426e/HLTrigger/btau/plugins/HLTJetTag.cc#L109-L110 would not make much sense.

I suppose a minimal improvement would be for the module to check that the ProductID of the h_Jets is consistent with the dependent, e.g. along

if (dependent.isNonnull() and dependent.id() != h_Jets.id()) {
  // deal with the inconsistency in some way, e.g. by throwing an exception
}
missirol commented 1 year ago

It seems to me the HLTJetTag<T> assumes the std::vector<T> is the "key" of JetTagCollection [..]

Okay, thanks for the insight.

I think it's also clear that this assumption ends up being wrong. Just adding some printouts, one can see that h_Jets and h_JetTags don't have the same jets, at least for some HLT modules [1].

[1] Printout from step2 of wf 11634.0; JJJ1 is h_Jets, JJJ2 is from h_JetTags.

Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 28-Mar-2023 15:05:51.466 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 28-Mar-2023 15:05:54.015 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 28-Mar-2023 15:05:56.008 CEST
***************
JJJ1 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ1 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ1 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ1 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
JJJ1 Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
---
JJJ2 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ2 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ2 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ2 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
JJJ2 Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
JJJ2 Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
JJJ2 Jet 6 : pt=50.5069 eta=-2.51323 phi=1.06846
JJJ2 Jet 7 : pt=35.3843 eta=1.77344 phi=-0.792386
***************
***************
JJJ1 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ1 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ1 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ1 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
JJJ1 Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
---
JJJ2 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ2 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ2 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ2 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
JJJ2 Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
JJJ2 Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
JJJ2 Jet 6 : pt=50.5069 eta=-2.51323 phi=1.06846
JJJ2 Jet 7 : pt=35.3843 eta=1.77344 phi=-0.792386
***************
***************
JJJ1 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ1 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ1 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ1 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
JJJ1 Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
JJJ1 Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
---
JJJ2 Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
JJJ2 Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
JJJ2 Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
JJJ2 Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
***************
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 28-Mar-2023 15:05:58.739 CEST
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 on stream 0 at 28-Mar-2023 15:06:00.797 CEST
***************
JJJ1 Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
JJJ1 Jet 1 : pt=70.0789 eta=2.66334 phi=-1.2127
JJJ1 Jet 2 : pt=60.5941 eta=3.1 phi=2.06209
JJJ1 Jet 3 : pt=45.9469 eta=0.0571585 phi=2.7399
---
JJJ2 Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
JJJ2 Jet 1 : pt=45.9469 eta=0.0571585 phi=2.7399
***************
***************
JJJ1 Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
JJJ1 Jet 1 : pt=70.0789 eta=2.66334 phi=-1.2127
JJJ1 Jet 2 : pt=60.5941 eta=3.1 phi=2.06209
JJJ1 Jet 3 : pt=45.9469 eta=0.0571585 phi=2.7399
---
JJJ2 Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
JJJ2 Jet 1 : pt=45.9469 eta=0.0571585 phi=2.7399
***************
makortel commented 1 year ago

From the printout it seems to me that in some cases the jets in h_Jets would be a subset of the jets in h_JetTags, and in other cases the h_JetTags would be a subset of h_Jets. Could this be just an impact of different module instances having different inputs? Or can the direction of subset vary between events in the same module?

missirol commented 1 year ago

Could this be just an impact of different module instances having different inputs? Or can the direction of subset vary between events in the same module?

I think it's the former. Apologies, I was sloppy with the printout in https://github.com/cms-sw/cmssw/issues/41045#issuecomment-1487231282, a clearer one is in [1] (in both cases, I'm printing from any instance of HLTJetTag<T> at HLT, and only when the sizes of the two collections differ).

I don't claim to understand this in full yet, but I figured out what happens for one case (hltBTagPFDeepJet0p71Single8Jets30): the original jet collection is one (hltAK4JetsCorrected, call it "the jets"), then various 'selectors' skim this collection, to the point where h_Jets is "the jets with pT>30 GeV", and the key of h_JetsTags is "the jets with pT>30 GeV, and |eta|<2.6".

[1]

Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:07.749 CEST
*************** hltBTagPFDeepJet0p71Single8Jets30
h_Jets    | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_Jets    | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_Jets    | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_Jets    | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
h_Jets    | Jet 4 : pt=35.8921 eta=-3.2305 phi=0.0366149
---
h_JetTags | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_JetTags | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_JetTags | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_JetTags | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
***************
*************** hltBTagPFDeepCSV0p71Single8Jets30
h_Jets    | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_Jets    | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_Jets    | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_Jets    | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
h_Jets    | Jet 4 : pt=35.8921 eta=-3.2305 phi=0.0366149
---
h_JetTags | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_JetTags | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_JetTags | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_JetTags | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
***************
*************** hltBTagPFDeepJet0p71Double8Jets30
h_Jets    | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_Jets    | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_Jets    | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_Jets    | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
h_Jets    | Jet 4 : pt=35.8921 eta=-3.2305 phi=0.0366149
---
h_JetTags | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_JetTags | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_JetTags | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_JetTags | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
***************
*************** hltBTagPFDeepCSV0p71Double8Jets30
h_Jets    | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_Jets    | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_Jets    | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_Jets    | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
h_Jets    | Jet 4 : pt=35.8921 eta=-3.2305 phi=0.0366149
---
h_JetTags | Jet 0 : pt=150.867 eta=-0.421884 phi=2.22497
h_JetTags | Jet 1 : pt=122.066 eta=-2.33867 phi=-0.532825
h_JetTags | Jet 2 : pt=119.872 eta=-2.15417 phi=-1.27154
h_JetTags | Jet 3 : pt=60.3565 eta=0.0812744 phi=-3.08803
***************
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:11.859 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:14.745 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:16.940 CEST
*************** hltBTagPFDeepJet0p71Double6Jets80
h_Jets    | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_Jets    | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_Jets    | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_Jets    | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
h_Jets    | Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
---
h_JetTags | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_JetTags | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_JetTags | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_JetTags | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
h_JetTags | Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
h_JetTags | Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
h_JetTags | Jet 6 : pt=50.5069 eta=-2.51323 phi=1.06846
h_JetTags | Jet 7 : pt=35.3843 eta=1.77344 phi=-0.792386
***************
*************** hltBTagPFDeepCSV0p71Double6Jets80
h_Jets    | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_Jets    | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_Jets    | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_Jets    | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
h_Jets    | Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
---
h_JetTags | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_JetTags | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_JetTags | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_JetTags | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
h_JetTags | Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
h_JetTags | Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
h_JetTags | Jet 6 : pt=50.5069 eta=-2.51323 phi=1.06846
h_JetTags | Jet 7 : pt=35.3843 eta=1.77344 phi=-0.792386
***************
*************** hltBSoftMuonJet40Mu12L3FilterByDR
h_Jets    | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_Jets    | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_Jets    | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_Jets    | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
h_Jets    | Jet 4 : pt=81.1659 eta=1.03291 phi=1.48506
h_Jets    | Jet 5 : pt=75.2862 eta=0.693766 phi=2.29357
---
h_JetTags | Jet 0 : pt=135.114 eta=0.173114 phi=-1.75153
h_JetTags | Jet 1 : pt=127.453 eta=1.03853 phi=0.812131
h_JetTags | Jet 2 : pt=93.7376 eta=-0.585623 phi=-1.8102
h_JetTags | Jet 3 : pt=81.8635 eta=0.488342 phi=-2.78908
***************
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:19.988 CEST
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:22.289 CEST
*************** hltBTagPFDeepJet0p71Single8Jets30
h_Jets    | Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
h_Jets    | Jet 1 : pt=70.0789 eta=2.66334 phi=-1.2127
h_Jets    | Jet 2 : pt=60.5941 eta=3.1 phi=2.06209
h_Jets    | Jet 3 : pt=45.9469 eta=0.0571585 phi=2.7399
---
h_JetTags | Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
h_JetTags | Jet 1 : pt=45.9469 eta=0.0571585 phi=2.7399
***************
*************** hltBTagPFDeepCSV0p71Single8Jets30
h_Jets    | Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
h_Jets    | Jet 1 : pt=70.0789 eta=2.66334 phi=-1.2127
h_Jets    | Jet 2 : pt=60.5941 eta=3.1 phi=2.06209
h_Jets    | Jet 3 : pt=45.9469 eta=0.0571585 phi=2.7399
---
h_JetTags | Jet 0 : pt=93.4168 eta=0.749616 phi=-2.58951
h_JetTags | Jet 1 : pt=45.9469 eta=0.0571585 phi=2.7399
***************
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:24.278 CEST
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:26.133 CEST
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 on stream 0 at 29-Mar-2023 21:10:28.917 CEST
makortel commented 1 year ago

Thanks @missirol. I took a look of the jet consumption chain of hltBTagPFDeepJet0p71Single8Jets30 from HLTrigger/Configuration/python/HLT_GRun_cff.py (menu /dev/CMSSW_13_0_0/GRun/V47). I see

hltBTagPFDeepJet0p71Single8Jets30.Jets = hltSelector8PFJets30
  that consumes hltSelectorJets30
  that consumes hltAK4PFJetsCorrected

hltBTagPFDeepJet0p71Single8Jets30.JetTags = hltDeepJetDiscriminatorsJetTags
  that consumes hltPFDeepFlavourJetTags
  that consumes hltPFDeepFlavourTagInfos
  that consumes hltDeepCombinedSecondaryVertexBJetTagsInfos
  that consumes hltDeepBLifetimeTagInfosPF
  that consumes hltPFJetForBtag
  that consumes hltPFJetForBtagSelector
  that consumes hltAK4PFJetsCorrected

It seems to me both of the Jets and JetTags chains go through many potential steps of selection, and therefore there is no prior guarantee that the jets in one would be a subset of the other for all events.

In addition, I see the jet selectors (LargestEtPFJetSelector and EtMinPFJetSelector) produce copies of the concrete collection (std::vector<PFJet>) in contrast to e.g. edm::RefVector<PFjet>, edm::RefToBaseVector<Jet> or edm::PtrVector<Jet>, so the framework-level identity of a jet object (to be used as a key when accessing the JetTagCollection) gets lost.

missirol commented 1 year ago

Slowly converging on this issue.

Once there is data/MC processed with an updated HLT menu, the input file of the unit test could be updated again (with a proper input, more is tested, as this issue shows).

By the way, thanks a lot for opening this issue (@aandvalenzuela). I doubt we would have caught this bug otherwise (and clearly, the bug went unnoticed for a long time).

One naive question: why didn't 'anything' catch this earlier? It was sort-of caught by accident (in a unit test of something pretty unrelated, in ASAN). Was this issue triggering 'alarms' anywhere else (e.g. other tests in ASAN, or else)? If not, why was it going unnoticed?

missirol commented 1 year ago

(according to my checks, there are 7 affected modules)

For reference, this check was done by adding a printout [1] in HLTJetTag<T>, following the suggestion in https://github.com/cms-sw/cmssw/issues/41045#issuecomment-1487177288, to see which modules were misconfigured (makesSense == 0). I noticed that when size == 0, one can have dependent.isNonnull() == 0 (and thus makesSense == 0), so I ignored those cases (I ran until I could find, for every module, a case with size != 0).

[1]

diff --git a/HLTrigger/btau/plugins/HLTJetTag.cc b/HLTrigger/btau/plugins/HLTJetTag.cc
index 61f15b886dc..556a7acd578 100644
--- a/HLTrigger/btau/plugins/HLTJetTag.cc
+++ b/HLTrigger/btau/plugins/HLTJetTag.cc
@@ -101,6 +101,9 @@ bool HLTJetTag<T>::hltFilter(edm::Event& event,
           << dependent_provenance.branchName() << ", which has been dropped";
   }

+edm::LogPrint("HLTJetTag") << moduleDescription().moduleLabel() << " : dep.isNonNull = " << dependent.isNonnull() << ", makesSense = "
+  << (dependent.isNonnull() and dependent.id() == h_Jets.id()) << " (" << h_Jets->size() << ", " << h_JetTags->size() << ")";
+
   TRef jetRef;

   // Look at all jets in decreasing order of Et.
makortel commented 1 year ago

One naive question: why didn't 'anything' catch this earlier? It was sort-of caught by accident (in a unit test of something pretty unrelated, in ASAN). Was this issue triggering 'alarms' anywhere else (e.g. other tests in ASAN, or else)? If not, why was it going unnoticed?

I don't remember seeing these symptoms elsewhere (but could easily just not remember). I can only guess along

missirol commented 1 year ago

(I'll try to answer, but my understanding is a bit limited here.)

  • Do we run HLTEventAnalyzerRAW in any other tests (including runTheMatrix.py workflows)?

I don't think so (and that was probably one of the reasons to introduce the unit test).

  • Are there any other code using the impacted collections in our tests? How do they use them?

I don't think so. The products of the misconfigured modules are not consumed by anything at HLT, other than by instances of TriggerSummaryProducer{AOD,RAW} (or at least, that's my understanding; iirc, this could be checked with the Tracer service?). Afaiu, this leads to the product hltTriggerSummaryRAW containing incorrect references. Then, in the unit test this problematic product is accessed by HLTEventAnalyzerRAW. Several modules in CMSSW tests consume hltTriggerSummaryRAW, but it is conceivable that none of them tries to access the products of the 7 problematic filters.

  • Maybe the problem has been subtle-enough that it hasn't caused crashes, and the values read from incorrect memory addresses reasonable-enough (even if undefined) to not make anyone to notice in DQM plots or similar?

I would be surprised if DQM was using/checking the products of the 7 filters in question (meaning, indirectly via hltTriggerSummaryRAW, or directly using the products of type trigger::TriggerFilterObjectWithRefs).

I guess my question was even more naive, and is about this line in HLTJetTag. There, the key can be larger than the size of *h_Jets. When this happens, is it possible that this could crash at runtime? And why don't we see ASAN errors from HLTJetTag itself? (sorry if these questions are a bit too basic)

missirol commented 1 year ago

The items in https://github.com/cms-sw/cmssw/issues/41045#issuecomment-1510408616 are progressing, and the unit test itself is working in recent ASAN IBs.

I guess my question was even more naive, and is about this line in HLTJetTag. There, the key can be larger than the size of *h_Jets. When this happens, is it possible that this could crash at runtime? And why don't we see ASAN errors from HLTJetTag itself? (sorry if these questions are a bit too basic)

@makortel , would you mind explaining this ?

I did try to re-run an old HLT menu with ASAN using events where problematic modules like hltBTagPFDeepJet0p71Single8Jets30 are executed, and those bugged producers themselves do not cause ASAN failures. It's not obvious to me why that is.

makortel commented 1 year ago

I guess my question was even more naive, and is about this line in HLTJetTag. There, the key can be larger than the size of *h_Jets. When this happens, is it possible that this could crash at runtime? And why don't we see ASAN errors from HLTJetTag itself? (sorry if these questions are a bit too basic)

@makortel , would you mind explaining this ?

The edm::Ref constructor does not try to access the element pointed by the index. The invalid memory access can therefore only occur when the Ref is de-referenced.

missirol commented 1 year ago

+hlt

The action items in https://github.com/cms-sw/cmssw/issues/41045#issuecomment-1510408616 have been tackled, modulo the completion of CMSHLT-2863.

To do (for HLT): add a similar test which uses the GRun menu (as opposed to a fixed EDM file), to test in ASAN builds the TriggerFilterObjectWithRefs products created by the latest HLT menus.

cmsbuild commented 1 year ago

This issue is fully signed and ready to be closed.

missirol commented 1 year ago

please close