cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.06k stars 4.24k forks source link

HLT crashes in `HLTMuonL1TFilter::hltFilter` #44940

Open mmusich opened 2 months ago

mmusich commented 2 months ago

This issue is to document several crashes related to HLTMuonL1TFilter::hltFilter that happened during:

In all occurrences there is a segmentation fault mentioning in the stack trace HLTMuonL1TFilter::hltFilter, e.g.:

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Tue May 7 12:54:35 CEST 2024
Thread 10 (Thread 0x7f8cbabfd700 (LWP 674105) "cmsRun"):
#0 0x00007f8d3e57b0e1 in poll () from /lib64/libc.so.6
#1 0x00007f8d348fd2ff in full_read.constprop () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#2 0x00007f8d348b0afc in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#3 0x00007f8d348b1460 in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#4 
#5 0x00007f8c56b4608d in HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginHLTriggerMuonAuto.so
#6 0x00007f8cbc6fde4c in HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libHLTriggerHLTcore.so
#7 0x00007f8d40fc5040 in edm::global::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#8 0x00007f8d40fbd83c in edm::WorkerT::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#9 0x00007f8d40f4bf59 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_6_MULTIARCHS/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#12 0x00007f8d3f6ee95b in tbb::detail::r1::task_dispatcher::local_wait_for_all (t=0x7f8be3d5ef00, waiter=..., this=0x7f8d39e92700) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#13 tbb::detail::r1::task_dispatcher::local_wait_for_all (t=0x0, waiter=..., this=0x7f8d39e92700) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::arena::process (tls=..., this=) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#15 tbb::detail::r1::market::process (this=, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/market.cpp:599
#16 0x00007f8d3f6f0b0e in tbb::detail::r1::rml::private_worker::run (this=0x7f8d39e87000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#17 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7f8d39e87000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#18 0x00007f8d3e8241ca in start_thread () from /lib64/libpthread.so.0
#19 0x00007f8d3e48fe73 in clone () from /lib64/libc.so.6
[ message truncated - showing only crashed thread ]

We have tried (unsuccessfully) to reproduce offline these crashes using the following scripts [1], [2]. For the record I am attaching the full stack trace from F3 mon for the runs in questions:

[1]

Script to check 380115 ```bash #!/bin/bash -ex scram p CMSSW CMSSW_14_0_5_patch1 cd CMSSW_14_0_5_patch1/src eval `scramv1 runtime -sh` https_proxy=http://cmsproxy.cms:3128 hltConfigFromDB --runNumber 380115 > hlt_run380115.py cat <<@EOF >> hlt_run380115.py from EventFilter.Utilities.EvFDaqDirector_cfi import EvFDaqDirector as _EvFDaqDirector process.EvFDaqDirector = _EvFDaqDirector.clone( buBaseDir = '/eos/cms/store/group/tsg/FOG/error_stream/', runNumber = 380115 ) from EventFilter.Utilities.FedRawDataInputSource_cfi import source as _source process.source = _source.clone( fileListMode = True, fileNames = ( '/eos/cms/store/group/tsg/FOG/error_stream/run380115/run380115_ls0338_index000079_fu-c2b03-28-01_pid1451372.raw', '/eos/cms/store/group/tsg/FOG/error_stream/run380115/run380115_ls0338_index000104_fu-c2b03-28-01_pid1451372.raw' ) ) process.options.wantSummary = True process.options.numberOfThreads = 32 process.options.numberOfStreams = 24 @EOF mkdir run380115 cmsRun hlt_run380115.py &> crash_run380115.log ```

[2]

Script to check 380466 ```bash #!/bin/bash -ex scram p CMSSW CMSSW_14_0_6_MULTIARCHS cd CMSSW_14_0_6_MULTIARCHS/src eval `scramv1 runtime -sh` https_proxy=http://cmsproxy.cms:3128 hltConfigFromDB --runNumber 380466 > hlt_run380466.py cat <<@EOF >> hlt_run380466.py from EventFilter.Utilities.EvFDaqDirector_cfi import EvFDaqDirector as _EvFDaqDirector process.EvFDaqDirector = _EvFDaqDirector.clone( buBaseDir = '/eos/cms/store/group/tsg/FOG/error_stream/', runNumber = 380466 ) from EventFilter.Utilities.FedRawDataInputSource_cfi import source as _source process.source = _source.clone( fileListMode = True, fileNames = ( '/eos/cms/store/group/tsg/FOG/error_stream/run380466/run380466_ls0276_index000212_fu-c2b03-09-01_pid672001.raw', '/eos/cms/store/group/tsg/FOG/error_stream/run380466/run380466_ls0276_index000232_fu-c2b03-09-01_pid672001.raw', '/eos/cms/store/group/tsg/FOG/error_stream/run380466/run380466_ls0276_index000246_fu-c2b03-09-01_pid672001.raw' ) ) process.options.wantSummary = True process.options.numberOfThreads = 32 process.options.numberOfStreams = 24 @EOF mkdir run380466 cmsRun hlt_run380466.py &> crash_run380466.log ```

Cc: @cms-sw/hlt-l2 @trtomei @mzarucki @trocino

cmsbuild commented 2 months ago

cms-bot internal usage

cmsbuild commented 2 months ago

A new Issue was created by @mmusich.

@smuzaffar, @rappoccio, @makortel, @Dr15Jones, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 2 months ago

assign hlt

cmsbuild commented 2 months ago

New categories assigned: hlt

@Martin-Grunewald,@mmusich you have been requested to review this Pull request/Issue and eventually sign? Thanks

mmusich commented 2 months ago

One more instance in run380531:

VinInn commented 2 months ago

I run 380466 on hlt machine with GPU and various thread/stream configurations w/o any crash

mmusich commented 2 months ago

I run 380466 on hlt machine with GPU and various thread/stream configurations w/o any crash

indeed quoting myself:

We have tried (unsuccessfully) to reproduce offline these crashes using the following scripts [1], [2].

VinInn commented 2 months ago

@mmusich ok. sorry. Reading eos and offline I though you run a lxplus-like machine w/o GPU

mmusich commented 2 months ago

Reading eos and offline I though you run a lxplus-like machine w/o GPU

I did run on lxplus-gpu using FRD files copied from the error stream (from the SM people). This is standard procedure from the FOG instructions.

makortel commented 2 months ago

Would running valgrind be feasible?

VinInn commented 2 months ago

runninf multi-arch with a GPU got

==756999== valgrind: Unrecognised instruction at address 0x57f5fd19.
==756999==    at 0x57F5FD19: void riemannFit::transformToPerigeePlane<Eigen::Matrix<double, 5, 1, 0, 5, 1>, Eigen::Matrix<double, 5, 5, 0, 5, 5>, Eigen::Matrix<double, 5, 1, 0, 5, 1>, Eigen::Matrix<double, 5, 5, 0, 5, 5> >(Eigen::Matrix<double, 5, 1, 0, 5, 1> const&, Eigen::Matrix<double, 5, 5, 0, 5, 5> const&, Eigen::Matrix<double, 5, 1, 0, 5, 1>&, Eigen::Matrix<double, 5, 5, 0, 5, 5>&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so)
==756999==    by 0x57F6E8F2: PixelTrackProducerFromSoAAlpaka<pixelTopology::Phase1>::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so)
==756999==    by 0x4A96411: edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so)
==756999==    by 0x4A8FABB: edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so)
==756999==    by 0x4A19F48: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so)
==756999==    by 0x4A247B7: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so)
==756999==    by 0x4E89F77: tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_MULTIARCHS_X_2024-05-02-2300/lib/el9_amd64_gcc12/scram_x86-64-v3/libFWCoreConcurrency.so)
==756999==    by 0x641C91A: UnknownInlinedFun (task_dispatcher.h:322)
==756999==    by 0x641C91A: UnknownInlinedFun (task_dispatcher.h:458)
==756999==    by 0x641C91A: UnknownInlinedFun (arena.cpp:137)
==756999==    by 0x641C91A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==756999==    by 0x641EACD: UnknownInlinedFun (private_server.cpp:271)
==756999==    by 0x641EACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==756999==    by 0x68D9801: start_thread (in /usr/lib64/libc.so.6)
==756999== Your program just tried to execute an instruction that Valgrind
==756999== did not recognise.  There are two possible reasons for this.
==756999== 1. Your program has a bug and erroneously jumped to a non-code
==756999==    location.  If you are running Memcheck and you just saw a
==756999==    warning about a bad jump, it's probably your program's fault.
==756999== 2. The instruction is legitimate but Valgrind doesn't handle it,
==756999==    i.e. it's Valgrind's fault.  If you think this is the case or
==756999==    you are not sure, please let us know and we'll try to fix it.
==756999== Either way, Valgrind will now raise a SIGILL signal which will
==756999== probably kill your program.
==756999== Warning: ignored attempt to set SIGRT32 handler in sigaction();
==756999==          the SIGRT32 signal is used internally by Valgrind

A fatal system signal has occurred: illegal instruction
The following is the call stack containing the origin of the signal.

==756999== Unsupported clone() flags: 0x311
==756999==
==756999== The only supported clone() uses are:
==756999==  - via a threads library (LinuxThreads or NPTL)
==756999==  - via the implementation of fork or vfork
==756999==
==756999== Valgrind detected that your program requires
==756999== the following unimplemented functionality:
==756999==    Valgrind does not support general clone().
==756999== This may be because the functionality is hard to implement,
==756999== or because no reasonable program would behave this way,
==756999== or because nobody has yet needed it.  In any case, let us know at
==756999== www.valgrind.org and/or try to work around the problem, if you can.
==756999==
==756999== Valgrind has to exit now.  Sorry.  Bye!
==756999==
makortel commented 2 months ago

Hmh, according to https://valgrind.org/info/platforms.html amd64/linux target should support instructions "up to and including AVX2". Ok, I found from the the release notes of 3.23 (we use 3.22)

AMD64 better supports code build with -march=x86-64-v3. fused-multiple-add instructions (fma) are now emulated more accurately. And memcheck now handles __builtin_strcmp using 128/256 bit vectors with sse4.1, avx/avx2.

https://valgrind.org/docs/manual/dist.news.html

@smuzaffar Could we update valgrind to 3.23 (at least in 14_1_X, 14_0_X could be useful too)?

smuzaffar commented 2 months ago

@makortel , https://github.com/cms-sw/cmsdist/pull/9185 updates valgrind to 3.23.0 for 14.1.X

VinInn commented 2 months ago

valgrnd manage to run with standard release and 1 GPU I found this

==790020== Thread 13:
==790020== Invalid read of size 8
==790020==    at 0xA58A3FB2: HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/pluginHLTriggerMuonAuto.so)
==790020==    by 0x9210DBCA: HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libHLTriggerHLTcore.so)
==790020==    by 0x4A8AE6D: edm::global::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A84D3B: edm::WorkerT<edm::global::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A11528: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A1B997: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x498877D: tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:322)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:458)
==790020==    by 0x640991A: UnknownInlinedFun (arena.cpp:137)
==790020==    by 0x640991A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==790020==    by 0x640BACD: UnknownInlinedFun (private_server.cpp:271)
==790020==    by 0x640BACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==790020== Invalid read of size 8
==790020==    at 0xA58A3FC7: HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/pluginHLTriggerMuonAuto.so)
==790020==    by 0x9210DBCA: HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libHLTriggerHLTcore.so)
==790020==    by 0x4A8AE6D: edm::global::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A84D3B: edm::WorkerT<edm::global::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A11528: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A1B997: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x498877D: tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:322)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:458)
==790020==    by 0x640991A: UnknownInlinedFun (arena.cpp:137)
==790020==    by 0x640991A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==790020==    by 0x640BACD: UnknownInlinedFun (private_server.cpp:271)
==790020==    by 0x640BACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==790020==    by 0x68C6801: start_thread (in /usr/lib64/libc.so.6)

plenty of those actually

many of these as well

==790020== Thread 9:
==790020== Conditional jump or move depends on uninitialised value(s)
==790020==    at 0xB9DFBC31: muonisolation::CaloExtractorByAssociator::deposits(edm::Event const&, edm::EventSetup const&, reco::Track const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/pluginRecoMuonMuonIsolationPlugins.so)
==790020==    by 0xB94C22AA: MuonIdProducer::fillMuonIsolation(edm::Event&, edm::EventSetup const&, reco::Muon&, reco::IsoDeposit&, reco::IsoDeposit&, reco::IsoDeposit&, reco::IsoDeposit&, reco::IsoDeposit&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/pluginRecoMuonMuonIdentificationPlugins.so)
==790020==    by 0xB94C7CCC: MuonIdProducer::produce(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/pluginRecoMuonMuonIdentificationPlugins.so)
==790020==    by 0x4AA65C2: edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A853EB: edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A11528: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, ed
m::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-0
5-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4A1B997: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nwee
k-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==790020==    by 0x4E74F27: tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (in /cvmfs/cms-ib.cern.ch/s
w/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libFWCoreConcurrency.so)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:322)
==790020==    by 0x640991A: UnknownInlinedFun (task_dispatcher.h:458)
==790020==    by 0x640991A: UnknownInlinedFun (arena.cpp:137)
==790020==    by 0x640991A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==790020==    by 0x640BACD: UnknownInlinedFun (private_server.cpp:271)
==790020==    by 0x640BACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==790020==    by 0x68C6801: start_thread (in /usr/lib64/libc.so.6)
VinInn commented 1 month ago
==796505== Thread 16:
==796505== Invalid read of size 8
==796505==    at 0xA5533FB2: UnknownInlinedFun (PtEtaPhiM4D.h:142)
==796505==    by 0xA5533FB2: UnknownInlinedFun (LorentzVector.h:644)
==796505==    by 0xA5533FB2: UnknownInlinedFun (ParticleState.h:139)
==796505==    by 0xA5533FB2: UnknownInlinedFun (LeafCandidate.h:148)
==796505==    by 0xA5533FB2: HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const (HLTMuonL1TFilter.cc:139)
==796505==    by 0x91F48BCA: HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02836/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-09-2300/lib/el9_amd64_gcc12/libHLTriggerHLTcore.so)

not obvious at first glance

VinInn commented 1 month ago

if (deltaR2(muon->eta(), muon->phi(), prevMuons[it2]->eta(), prevMuons[it2]->phi()) < maxDR2_)

given that muon->eta() is accessed above (and no reports from valgrind) it should be prevMuons...

VinInn commented 1 month ago

following https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Find-a-memory-corruption-bug I set export MALLOC_CONF=junk:true and got a crash somewhere else! multiple

%MSG-e EcalRecHitError:  EcalRecHitProducer:hltEcalRecHit  12-May-2024 08:41:48 CEST Run: 380466 Event: 490512903
No intercalib const found for xtal 2779096485! something wrong with EcalIntercalibConstants in your DB?
%MSG
%MSG-e EcalLaserDbService:  EcalRecHitProducer:hltEcalRecHit  12-May-2024 08:41:48 CEST Run: 380466 Event: 490512903
 DetId is NOT in ECAL

and then segfault in

Module: HLTEcalRecHitInAllL1RegionsProducer:hltRechitInRegionsECAL (crashed)
VinInn commented 1 month ago

so I set export MALLOC_CONF=zero:true I get no crash: just different junk in the location where valgrind report the issue.

VinInn commented 1 month ago

btw I added this

diff --git a/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc b/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc
index 3b8f3334bef..b2da2a351e5 100644
--- a/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc
+++ b/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc
@@ -136,6 +136,10 @@ bool HLTMuonL1TFilter::hltFilter(edm::Event& iEvent,
       bool matchPrevL1 = false;
       int prevSize = prevMuons.size();
       for (int it2 = 0; it2 < prevSize; it2++) {
+        if (prevMuons[it2].isNull()) std::cout << ">>> not valid ref " << it2 << std::endl;
+        auto const &  m = prevMuons[it2];
+        if (m->pt() < 0.01 || std::abs(m->eta())>7 || std::abs(m->phi())>6.3) std::cout << ">>> ??"
+           << m.index() << ' ' << m.get() << ' ' << m->pt() << ' ' << m->eta() << ' ' << m->phi() << std::endl;
         if (deltaR2(muon->eta(), muon->phi(), prevMuons[it2]->eta(), prevMuons[it2]->phi()) < maxDR2_) {
           matchPrevL1 = true;
           break;

it gets printout at the place where valgrind report the issue and the content is clear junk and is not reproducible

[innocent@gputest-genoa-01 (gpu-c2e35-08-01) hltBug]$ grep ">>> ??"  *.log
bug.log:>>> ??4 0x7fa75a0afd60 6.9347e-310 2.98429e-315 0
bug.log:>>> ??4 0x7fa75a0afd60 6.9347e-310 2.98429e-315 0
bug.log:>>> ??4 0x7fa75a0afd60 6.9347e-310 2.98429e-315 0
bug2.log:>>> ??4 0x7fe65e713460 6.94792e-310 6.91692e-323 6.38221e+25
bug2.log:>>> ??4 0x7fe65e713460 6.94792e-310 6.91692e-323 6.38221e+25
bug2.log:>>> ??4 0x7fe65e713460 6.94792e-310 6.91692e-323 6.38221e+25
bug3.log:>>> ??4 0x7f517eba7d60 7.45854e+82 -3.91644e+217 -nan
bug3.log:>>> ??4 0x7f517eba7d60 7.45854e+82 -3.91644e+217 -nan
bug3.log:>>> ??4 0x7f517eba7d60 7.45854e+82 -3.91644e+217 -nan
valg2.log:>>> ??4 0x19fa8be10 7.11455e-322 0 1.44712e-320
valg2.log:>>> ??4 0x19fa8be10 7.11455e-322 0 1.44712e-320
valg2.log:>>> ??4 0x19fa8be10 7.11455e-322 0 1.44712e-320
zeroMem.log:>>> ??4 0x7f21d76ec660 6.90621e-310 6.91692e-323 6.38221e+25
zeroMem.log:>>> ??4 0x7f21d76ec660 6.90621e-310 6.91692e-323 6.38221e+25
zeroMem.log:>>> ??4 0x7f21d76ec660 6.90621e-310 6.91692e-323 6.38221e+25
mmusich commented 1 month ago

For reference prevMuons is defined as:

https://github.com/cms-sw/cmssw/blob/ab4ccb06a6fd036174d0101da57f8b26f029bce8/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc#L105-L109

where:

https://github.com/cms-sw/cmssw/blob/ab4ccb06a6fd036174d0101da57f8b26f029bce8/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc#L27-L28

and the crashing module configuration: hltL1fL1sCDCL1Filtered0 is

process.hltL1sCDC = cms.EDFilter( "HLTL1TSeed",
    saveTags = cms.bool( True ),
    L1SeedsLogicalExpression = cms.string( "L1_CDC_SingleMu_3_er1p2_TOP120_DPHI2p618_3p142" ),
    L1ObjectMapInputTag = cms.InputTag( "hltGtStage2ObjectMap" ),
    L1GlobalInputTag = cms.InputTag( "hltGtStage2Digis" ),
    L1MuonInputTag = cms.InputTag( 'hltGtStage2Digis','Muon' ),
    L1MuonShowerInputTag = cms.InputTag( 'hltGtStage2Digis','MuonShower' ),
    L1EGammaInputTag = cms.InputTag( 'hltGtStage2Digis','EGamma' ),
    L1JetInputTag = cms.InputTag( 'hltGtStage2Digis','Jet' ),
    L1TauInputTag = cms.InputTag( 'hltGtStage2Digis','Tau' ),
    L1EtSumInputTag = cms.InputTag( 'hltGtStage2Digis','EtSum' ),
    L1EtSumZdcInputTag = cms.InputTag( 'hltGtStage2Digis','EtSumZDC' )
)

process.hltL1fL1sCDCL1Filtered0 = cms.EDFilter( "HLTMuonL1TFilter",
    saveTags = cms.bool( True ),
    CandTag = cms.InputTag( 'hltGtStage2Digis','Muon' ),
    PreviousCandTag = cms.InputTag( "hltL1sCDC" ),
    MaxEta = cms.double( 2.5 ),
    MinPt = cms.double( 0.0 ),
    MaxDeltaR = cms.double( 0.3 ),
    MinN = cms.int32( 1 ),
    CentralBxOnly = cms.bool( False ),
    SelectQualities = cms.vint32(  )
)

@cms-sw/l1-l2 FYI

VinInn commented 1 month ago

running UBSAN found this

src/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc:139:89: runtime error: member call on address 0x7f6d1879fb60 which does not point to an object of type 'LeafCandidate'
0x7f6d1879fb60: note: object has a possibly invalid vptr: abs(offset to top) too big
 6d 7f 00 00  40 58 ae 1b 6d 7f 00 00  c1 33 3c 40 03 00 00 00  24 01 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              possibly invalid vptr
    #0 0x7f6db97b7c7e in HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const src/HLTrigger/Muon/plugins/HLTMuonL1TFilter.cc:139
    #1 0x7f6dd386d849 in HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const src/HLTrigger/HLTcore/src/HLTFilter.cc:34
    #2 0x7f6ec15a8ee6 in edm::global::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) src/FWCore/Framework/src/global/EDFilterBase.cc:67
    #3 0x7f6ec15768f8 in edm::WorkerT<edm::global::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) src/FWCore/Framework/src/WorkerT.cc:202
    #4 0x7f6ec0d4e03f in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::call(edm::Worker*, edm::StreamID, edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::StreamContext const*) src/FWCore/Framework/interface/maker/Worker.h:700
    #5 0x7f6ec0d4e03f in edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}::operator()() const src/FWCore/Framework/interface/maker/Worker.h:1259
    #6 0x7f6ec0d4e03f in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) src/FWCore/Utilities/interface/ConvertException.h:21
    #7 0x7f6ec0d4ea44 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) src/FWCore/Framework/interface/maker/Worker.h:1258
    #8 0x7f6ec0d4ea44 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) src/FWCore/Framework/interface/maker/Worker.h:1172
    #9 0x7f6ec0d672bf in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() src/FWCore/Framework/interface/maker/Worker.h:499
    #10 0x7f6ec04eb14c in edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}::operator()() const src/FWCore/Concurrency/interface/WaitingTaskHolder.h:107
    #11 0x7f6ec04eb14c in task_ptr_or_nullptr_impl<const edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/include/
oneapi/tbb/task_group.h:115
    #12 0x7f6ec04eb14c in task_ptr_or_nullptr<const edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/include/oneap
i/tbb/task_group.h:125
    #13 0x7f6ec04eb14c in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9
.0-a7089dd5ec356e9a0bc222e109b15cef/include/oneapi/tbb/task_group.h:452
    #14 0x7f6eb9bda95a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testB
uildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #15 0x7f6eb9bda95a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir
/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #16 0x7f6eb9bda95a in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/arena.cpp:137
    #17 0x7f6eb9bda95a in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/market.cpp:599
    #18 0x7f6eb9bdcb0d in tbb::detail::r1::rml::private_worker::run() /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:271
    #19 0x7f6eb9bdcb0d in tbb::detail::r1::rml::private_worker::thread_routine(void*) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:221
    #20 0x7f6eb86de1c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
    #21 0x7f6eb8349e72 in clone (/lib64/libc.so.6+0x39e72)
aloeliger commented 1 month ago

Only change I know from the L1 side for muons recently is the OMTF->GMT unconstrained PT update. I think that involved an unpacker update however. I assume in that for HLT hltGtStage2Digis is unpacked?

VinInn commented 1 month ago

and this is ASAN: who aborts after finding the error

=================================================================
==1067253==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6180043d7810 at pc 0x7effcafbad57 bp 0x7effda01d5b0 sp 0x7effda01d5a8
READ of size 8 at 0x6180043d7810 thread T12
    #0 0x7effcafbad56 in HLTMuonL1TFilter::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginHLTr
iggerMuonAuto.so+0x220d56)
    #1 0x7effd3c15568 in HLTFilter::filter(edm::StreamID, edm::Event&, edm::EventSetup const&) const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libHLTriggerHLTcore.so+0xf6568)
    #2 0x7f004f17bcf7 in edm::global::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_
gcc12/libFWCoreFramework.so+0x917cf7)
    #3 0x7f004f166d58 in edm::WorkerT<edm::global::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFW
CoreFramework.so+0x902d58)
    #4 0x7f004ee34457 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm:
:StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrinc
ipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cm
ssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0457)
    #5 0x7f004ee34b2f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActio
nType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/
lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0b2f)
    #6 0x7f004ee3fddd in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFr
amework.so+0x5dbddd)
    #7 0x7f004ea95bf1 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_A
SAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x231bf1)
    #8 0x7f004c91195a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBu
ildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #9 0x7f004c91195a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/
BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #10 0x7f004c91195a in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/arena.cpp:137
    #11 0x7f004c91195a in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/market.cpp:599
    #12 0x7f004c913b0d in tbb::detail::r1::rml::private_worker::run() /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:271
    #13 0x7f004c913b0d in tbb::detail::r1::rml::private_worker::thread_routine(void*) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:221
    #14 0x7f004ba4a1c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
    #15 0x7f004b6b5e72 in clone (/lib64/libc.so.6+0x39e72)

0x6180043d7810 is located 48 bytes to the right of 864-byte region [0x6180043d7480,0x6180043d77e0)
allocated by thread T12 here:
    #0 0x7f004f4496d8 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:95
    #1 0x7f0036cd2fcf in void std::vector<l1t::Muon, std::allocator<l1t::Muon> >::_M_realloc_insert<l1t::Muon const&>(__gnu_cxx::__normal_iterator<l1t::Muon*, std::vector<l1t::Muon, std::allocator<l1t::Muon> > >, l1t::Muon const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-
02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libDataFormatsL1Trigger.so+0x11ffcf)
    #2 0x7effe4d8c5bc in std::vector<l1t::Muon, std::allocator<l1t::Muon> >::insert(__gnu_cxx::__normal_iterator<l1t::Muon const*, std::vector<l1t::Muon, std::allocator<l1t::Muon> > >, l1t::Muon const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cm
ssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginL1TriggerL1TGlobalPlugins.so+0x1245bc)
    #3 0x7effe4d8d357 in BXVector<l1t::Muon>::push_back(int, l1t::Muon) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginL1TriggerL1TGlobalPlugins.so+0x125357)
    #4 0x7effd2192d47 in l1t::stage2::MuonUnpacker::unpackBx(int, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned int) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el
8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x502d47)
    #5 0x7effd2197b1a in l1t::stage2::MuonUnpacker::unpack(l1t::Block const&, l1t::UnpackerCollections*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x5
07b1a)
    #6 0x7effd1daf763 in l1t::L1TRawToDigi::produce(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x11f763)
    #7 0x7f004f1e5602 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/e
l8_amd64_gcc12/libFWCoreFramework.so+0x981602)
    #8 0x7f004f166338 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gc
c12/libFWCoreFramework.so+0x902338)
    #9 0x7f004ee34457 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm:
:StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrinc
ipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cm
ssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0457)
    #10 0x7f004ee34b2f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActi
onType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300
/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0b2f)
    #11 0x7f004ee3fddd in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreF
ramework.so+0x5dbddd)
    #12 0x7f004ea95bf1 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_
ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x231bf1)
    #13 0x7f004c91195a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testB
uildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #14 0x7f004c91195a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir
/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #15 0x7f004c91195a in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/arena.cpp:137
    #16 0x7f004c91195a in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/market.cpp:599
    #17 0x7f004c913b0d in tbb::detail::r1::rml::private_worker::run() /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:271
    #18 0x7f004c913b0d in tbb::detail::r1::rml::private_worker::thread_routine(void*) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/private_server.cpp:221
gparida commented 1 month ago

There was an HLT crash in the run 381147 of the fill 9666, which seems to be a crash of the nature report in this issue—attaching the log. old_hlt_run381147_pid2159904.log

VinInn commented 1 month ago

the problem is very very frequent. just run in a ASAN release on any raw and it will almost immediately crash...

VinInn commented 1 month ago

I actived the dump in https://cmssdt.cern.ch/lxr/source/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc#0155 (changed LogTrace in cout) and got this in a ASAN release

HLTL1TSeed::hltFilter
  Dump TriggerFilterObjectWithRefs

  HLTL1TSeed: seed logical expression = L1_CDC_SingleMu_3_er1p2_TOP120_DPHI2p618_3p142

  L1Mu seeds:      2

    L1Mu        q = 1   pt = 4.5    eta =  1.07662  phi =  -1.33095
=================================================================
==1888945==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x618002c2b464 at pc 0x7fc61cbe084f bp 0x7ffda671cb60 sp 0x7ffda671cb58
READ of size 4 at 0x618002c2b464 thread T0
    #0 0x7fc61cbe084e in l1t::Muon::hwCharge() const /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/src/DataFormats/L1Trigger/interface/Muon.h:96
    #1 0x7fc61cbe084e in HLTL1TSeed::dumpTriggerFilterObjectWithRefs(trigger::TriggerFilterObjectWithRefs&) const src/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc:176
    #2 0x7fc61cbfd7f2 in HLTL1TSeed::hltFilter(edm::Event&, edm::EventSetup const&, trigger::TriggerFilterObjectWithRefs&) src/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc:148
    #3 0x7fc61d43e6be in HLTStreamFilter::filter(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libHLTriggerHLTcore.so+0x13e6be)
    #4 0x7fc690e1bd12 in edm::stream::EDFilterAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x976d12)
    #5 0x7fc690da4ab8 in edm::WorkerT<edm::stream::EDFilterAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x8ffab8)
    #6 0x7fc690a75457 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0457)
    #7 0x7fc690a75b2f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0b2f)
    #8 0x7fc690a80ddd in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5dbddd)
    #9 0x7fc68fc1268e in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so+0x1268e)
    #10 0x7fc68e55b280 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #11 0x7fc68e55b280 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #12 0x7fc68e55b280 in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
    #13 0x7fc6907fdbbb in edm::FinalWaitingTask::wait() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x358bbb)
    #14 0x7fc69079f022 in edm::EventProcessor::processRuns() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x2fa022)
    #15 0x7fc6907d072d in edm::EventProcessor::runToCompletion() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x32b72d)
    #16 0x40bb64 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x40bb64)
    #17 0x7fc68e5479ac in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/arena.cpp:688
    #18 0x40f71a in main::{lambda()#1}::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x40f71a)
    #19 0x4083b4 in main (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x4083b4)
    #20 0x7fc68d2f7d84 in __libc_start_main (/lib64/libc.so.6+0x3ad84)
    #21 0x4086ed in _start (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x4086ed)

0x618002c2b464 is located 132 bytes to the right of 864-byte region [0x618002c2b080,0x618002c2b3e0)
allocated by thread T0 here:
    #0 0x7fc69108a6d8 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:95
    #1 0x7fc67890ffcf in void std::vector<l1t::Muon, std::allocator<l1t::Muon> >::_M_realloc_insert<l1t::Muon const&>(__gnu_cxx::__normal_iterator<l1t::Muon*, std::vector<l1t::Muon, std::allocator<l1t::Muon> > >, l1t::Muon const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libDataFormatsL1Trigger.so+0x11ffcf)
    #2 0x7fc62694d5bc in std::vector<l1t::Muon, std::allocator<l1t::Muon> >::insert(__gnu_cxx::__normal_iterator<l1t::Muon const*, std::vector<l1t::Muon, std::allocator<l1t::Muon> > >, l1t::Muon const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginL1TriggerL1TGlobalPlugins.so+0x1245bc)
    #3 0x7fc62694e357 in BXVector<l1t::Muon>::push_back(int, l1t::Muon) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginL1TriggerL1TGlobalPlugins.so+0x125357)
    #4 0x7fc61b968d47 in l1t::stage2::MuonUnpacker::unpackBx(int, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned int) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x502d47)
    #5 0x7fc61b96db1a in l1t::stage2::MuonUnpacker::unpack(l1t::Block const&, l1t::UnpackerCollections*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x507b1a)
    #6 0x7fc61b585763 in l1t::L1TRawToDigi::produce(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/pluginEventFilterL1TRawToDigiAuto.so+0x11f763)
    #7 0x7fc690e26602 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x981602)
    #8 0x7fc690da7338 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x902338)
    #9 0x7fc690a75457 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0457)
    #10 0x7fc690a75b2f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5d0b2f)
    #11 0x7fc690a80ddd in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x5dbddd)
    #12 0x7fc6906d6bf1 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x231bf1)
    #13 0x7fc68e55b280 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #14 0x7fc68e55b280 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #15 0x7fc68e55b280 in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
    #16 0x7fc6907fdbbb in edm::FinalWaitingTask::wait() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x358bbb)
    #17 0x7fc69079f022 in edm::EventProcessor::processRuns() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x2fa022)
    #18 0x7fc6907d072d in edm::EventProcessor::runToCompletion() (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so+0x32b72d)
    #19 0x40bb64 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x40bb64)
    #20 0x7fc68e5479ac in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-a7089dd5ec356e9a0bc222e109b15cef/tbb-v2021.9.0/src/tbb/arena.cpp:688
    #21 0x40f71a in main::{lambda()#1}::operator()() const (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x40f71a)
    #22 0x4083b4 in main (/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/bin/el8_amd64_gcc12/cmsRun+0x4083b4)
    #23 0x7fc68d2f7d84 in __libc_start_main (/lib64/libc.so.6+0x3ad84)

SUMMARY: AddressSanitizer: heap-buffer-overflow /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02837/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-15-2300/src/DataFormats/L1Trigger/interface/Muon.h:96 in l1t::Muon::hwCharge() const
Shadow bytes around the buggy address:
  0x0c308057d630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c308057d640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c308057d650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c308057d660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c308057d670: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
=>0x0c308057d680: fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa fa
  0x0c308057d690: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c308057d6a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c308057d6b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c308057d6c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c308057d6d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1888945==ABORTING
mmusich commented 1 month ago

just run in a ASAN release on any raw and it will almost immediately crash...

I read from the stack trace that you used CMSSW_14_1_ASAN_X_2024-05-15-2300 for testing. Which menu / data in input has been used? Please post a recipe for - the record - so that one does not have to start from scratch. Thank you.

missirol commented 1 month ago

https://its.cern.ch/jira/browse/CMSLITDPG-1221?focusedId=6247237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-6247237

It seems there are known mismatches between L1T firmware and emulator for CDC seeds (and I don't know if it's related to these crashes, but it might). I don't know anything about these mismatches (when they started, what is causing them, what is the plan to fix them). @elfontan , could you please clarify ?

missirol commented 1 month ago

I'm using this

#!/bin/bash

hltGetConfiguration run:381147 \
  --globaltag 140X_dataRun3_HLT_v3 \
  --no-prescale \
  --no-output \
  --max-events 1 \
  --paths HLT_CDC_L2cosmic_10_er1p0_v* \
  --input root://eoscms.cern.ch//eos/cms/store/group/tsg/FOG/error_stream_root/run381147/run381147_ls0202_index000187_fu-c2b05-29-01_pid2159904.root \
  > hlt.py

cat <<@EOF >> hlt.py
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0

process.source.skipEvents = cms.untracked.uint32( 56 )

del process.MessageLogger
process.load("FWCore.MessageLogger.MessageLogger_cfi")
@EOF

cmsRun hlt.py &> hlt.log
missirol commented 1 month ago

This dodges the issue. Not sure the warning is accurate, and whether or not this should be implemented regardless of the root cause of the problem (if so, the same check would probably have to be added for other L1T objects in that same EDFilter).

diff --git a/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc b/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc
index 699a170d60d..6fae44e83bb 100644
--- a/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc
+++ b/HLTrigger/HLTfilters/plugins/HLTL1TSeed.cc
@@ -950,6 +950,13 @@ bool HLTL1TSeed::seedsL1TriggerObjectMaps(edm::Event& iEvent, trigger::TriggerFi
                                     << "\nNo muons added to filterproduct." << endl;
     } else {
       for (std::list<int>::const_iterator itObj = listMuon.begin(); itObj != listMuon.end(); ++itObj) {
+        if (*itObj < 0 or unsigned(*itObj) >= muons->size(0)) {
+          edm::LogWarning("HLTL1TSeed")
+              << "Invalid index from the L1ObjectMap (L1uGT emulator), will be ignored (l1t::MuonBxCollection):"
+              << " index=" << *itObj << " (size of unpacked L1T objects in BX0 = " << muons->size(0) << ")";
+          continue;
+        }
+
         // Transform to index for Bx = 0 to begin of BxVector
         unsigned int index = muons->begin(0) - muons->begin() + *itObj;
missirol commented 1 month ago

Here's my rough understanding of the underlying issue. Some of this might be inaccurate, a L1T expert should comment.

The example from https://github.com/cms-sw/cmssw/issues/44940#issuecomment-2131391510 shows (patch)

Begin processing the 1st record. Run 381147, Event 351398133, LumiSection 202 on stream 0 at 25-May-2024 23:02:48.997 CEST

 bx=-2  pt=1 eta=2.28375 phi=-1.48367 hwPt=3 hwEtaAtVtx=204 hwPhiAtVtx=380 hwQual=7
 bx=-1  pt=6.5 eta=2.22937 phi=2.49793 hwPt=14 hwEtaAtVtx=204 hwPhiAtVtx=217 hwQual=12
 bx=-1  pt=4 eta=0.36975 phi=2.87971 hwPt=9 hwEtaAtVtx=32 hwPhiAtVtx=204 hwQual=12
 bx=0  pt=4 eta=0.815625 phi=-0.632841 hwPt=9 hwEtaAtVtx=74 hwPhiAtVtx=458 hwQual=12

The "object map" returns indices 1 and 0 (the first one refers to the 2nd muon in BX=-1, the second one refers to the only muon in BX=0). Then, HLTL1TSeed interprets index 1 as a 2nd muon in BX=0 (which does not exist), and that leads to the problem.

Based on the above, I don't see how HLTL1TSeed can add to the event the two muons that fired L1_CDC_SingleMu_3_er1p2_TOP120_DPHI2p618_3p142 (up to now, the second muon was probably another muon that happened to be in BX=0, or unphysical values from a wrong memory access (?)). The problem is that it looks like the Paths HLT_CDC_L2cosmic_10_er1p0_v and HLT_CDC_L2cosmic_5p5_er1p0_v use hltL1fL1sCDCL1Filtered0 to seed the HLT muon reconstruction, which seems conceptually wrong given that the correct L1T objects cannot be retrieved in this case with the current L1T software ("object map" + HLTL1TSeed).

VinInn commented 1 month ago

I think the patch proposed by @missirol should be implemented ASAP, at least to monitor the frequency of this misbehavior. I leave to the Trigger coordinators to evaluate the urgency to put pressure on the L1T muon crew to solve the issue upstream.

VinInn commented 1 month ago

BTW: should a new more specific issue be opened against L1TSeed or the L1TMuon unpacker? (or the cosmic HLT?)

mmusich commented 1 month ago

BTW: should a new more specific issue be opened against L1TSeed or the L1TMuon unpacker? (or the cosmic HLT?)

this is tracked at https://its.cern.ch/jira/browse/CMSHLT-3216

missirol commented 1 month ago

The patch in https://github.com/cms-sw/cmssw/issues/44940#issuecomment-2131415449 is implemented in #45047 (14_1_X) and #45048 (14_0_X).

In the near future, maybe a better patch would ensure that HLTL1TSeed adds no objects to the Event (and emits a warning) if the L1T algo in question is using objects from different BXs (which is something that HLTL1TSeed cannot really handle).

fwyzard commented 1 month ago

In the near future, maybe a better patch would ensure that HLTL1TSeed adds no objects to the Event (and emits a warning) if the L1T algo in question is using objects from different BXs (which is something that HLTL1TSeed cannot really handle).

Can HLTL1TSeed add the in-time objects and skip the out-of-time ones ?

For the out-of-time muons, would it be useful to be able to tag them ? Or anyway the HLT reconstruction would not be able to use them ?

missirol commented 1 month ago

Can HLTL1TSeed add the in-time objects and skip the out-of-time ones ?

For what I understand, not in the current implementation, because HLTL1TSeed just uses what the "object map" provides, and it seems the "object map" contains indices, but no info on the BX of the objects behind those indices. It seems to me that deeper changes would be needed to identify correctly the in-time ones (e.g. an improvement of the "object map" format). Alternatively, I was wondering if it would make sense to restrict the objects used by the "object map" to BX=0 (with hltGtStage2ObjectMap.L1DataBxInEvent = 1, iiuc): in that case the indices in the "object map" would just be the in-time ones (but unpacked and emulated L1T decisions would disagree for any L1T algo using objects from BXs different from 0).

For the out-of-time muons, would it be useful to be able to tag them ? Or anyway the HLT reconstruction would not be able to use them ?

This, I don't really know (I would guess the HLT reconstruction would not be able to use them, but I might be wrong).

Martin-Grunewald commented 1 month ago

For what I understand, not in the current implementation, because HLTL1TSeed just uses what the "object map" provides, and it seems the "object map" contains indices, but no info on the BX of the objects behind those indices. It seems to me that deeper changes would be needed to identify correctly the in-time ones (e.g. an improvement of the "object map" format). Alternatively, I was wondering if it would make sense to restrict the objects used by the "object map" to BX=0 (with hltGtStage2ObjectMap.L1DataBxInEvent = 1, iiuc): in that case the indices in the "object map" would just be the in-time ones (but unpacked and emulated L1T decisions would disagree for any L1T algo using objects from BXs different from 0).

I agree, this would be a consistency fix, and cover most use cases. Triggers looking at BX<>0 should be rare special cases which should need specific treatment anyway.

mmusich commented 1 month ago

Alternatively, I was wondering if it would make sense to restrict the objects used by the "object map" to BX=0 (with hltGtStage2ObjectMap.L1DataBxInEvent = 1, iiuc): in that case the indices in the "object map" would just be the in-time ones (but unpacked and emulated L1T decisions would disagree for any L1T algo using objects from BXs different from 0).

this is tracked at https://its.cern.ch/jira/browse/CMSHLT-3218

fwyzard commented 1 month ago

Can we stick to GitHub for issues, instead of splitting them between GitHub and JIRA (which unlike GH has a horrible user interface) ?

mmusich commented 1 month ago

Can we stick to GitHub for issues, instead of splitting them between GitHub and JIRA (which unlike GH has a horrible user interface) ?

my understanding is that we are using gitHub for discussing s/w issues and JIRA for HLT configuration changes. So in short - no.

fwyzard commented 1 month ago

OK, then. Feel free to enjoy the crappy user interface and the lack of feedback.

mmusich commented 1 month ago

Feel free to enjoy the crappy user interface and the lack of feedback.

to be honest I am not enjoying it at all, but if we want to move everything to gitHub (at least for the HLT-related items that directly rely on cmssw, e.g. menus, tests, etc. - broadly speaking the "HLT configurations" and "STORM tasks" components) and not on JIRA it's a decision that should be taken at coordination level (which is above my paygrade). It could be discussed elsewhere.