cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.12k stars 4.38k forks source link

[UBSAN] SiPixelRawToCluster: null ptr passed #47137

Open smuzaffar opened 2 weeks ago

smuzaffar commented 2 weeks ago

We have few runtime errors like [a] in UBSAN IBs. The errors is triggered from https://github.com/cms-sw/cmssw/blob/master/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.h#L130 where I think const uint32_t* src points to nullptr. https://github.com/cms-sw/cmssw/blob/master/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToCluster.cc#L182-L243 shows that for few items https://github.com/cms-sw/cmssw/blob/master/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToCluster.cc#L230 is not called ( see the continue statements in the loop)

@AdrianoDee , this section of code was added last year via https://github.com/cms-sw/cmssw/pull/41285. Can you please look in to it and provide a fix? We might also need a fix for https://github.com/cms-sw/cmssw/blob/master/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterCUDA.cc

[a] https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/el8_amd64_gcc12/CMSSW_15_0_UBSAN_X_2025-01-17-2300/pyRelValMatrixLogs/run/161.1_QCD_Pt_80_120_5362_HI_2024/step2_QCD_Pt_80_120_5362_HI_2024.log

src/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.h:130:20: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x14f1b02e0b2d in alpaka_serial_sync::pixelDetails::WordFedAppender::initializeWordFed(int, unsigned int, unsigned int const*, unsigned int) src/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.h:130
    #1 0x14f1b02e0b2d in alpaka_serial_sync::SiPixelRawToCluster<pixelTopology::HIonPhase1>::acquire(alpaka_serial_sync::device::Event const&, alpaka_serial_sync::device::EventSetup const&) src/RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToCluster.cc:242
    #2 0x14f1b021e710 in alpaka_serial_sync::stream::SynchronizingEDProducer<>::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) src/HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h:34
    #3 0x14f2bdba8553 in edm::stream::doAcquireIfNeeded(edm::stream::impl::ExternalWork*, edm::Event const&, edm::EventSetup const&, edm::WaitingTaskHolder&&) src/FWCore/Framework/src/stream/ProducingModuleHelper.cc:15
    #4 0x14f2bdb911ef in edm::stream::EDProducerAdaptorBase::doAcquire(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::WaitingTaskHolder&&) src/FWCore/Framework/src/stream/EDProducerAdaptorBase.cc:102
    #5 0x14f2bd8db2be in operator() src/FWCore/Framework/src/Worker.cc:401
    #6 0x14f2bd8db2be in wrap<edm::Worker::runAcquire(const edm::EventTransitionInfo&, const edm::ParentContext&, edm::WaitingTaskHolder)::<lambda()> > src/FWCore/Utilities/interface/ConvertException.h:21
    #7 0x14f2bd8db2be in edm::Worker::runAcquire(edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskHolder) src/FWCore/Framework/src/Worker.cc:401
    #8 0x14f2bd8dbf3d in edm::Worker::runAcquireAfterAsyncPrefetch(std::__exception_ptr::exception_ptr, edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskHolder) src/FWCore/Framework/src/Worker.cc:427
    #9 0x14f2bcfd98eb in edm::Worker::AcquireTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>, void>::execute() src/FWCore/Framework/interface/maker/Worker.h:576
    #10 0x14f2ba5bcfe9 in operator() src/FWCore/Concurrency/src/WaitingTaskList.cc:206
    #11 0x14f2ba5bcfe9 in task_ptr_or_nullptr_impl<const edm::WaitingTaskList::announce()::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/include/oneapi/tbb/task_group.h:115
    #12 0x14f2ba5bcfe9 in task_ptr_or_nullptr<const edm::WaitingTaskList::announce()::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/include/oneapi/tbb/task_group.h:125
    #13 0x14f2ba5bcfe9 in execute /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/include/oneapi/tbb/task_group.h:452
    #14 0x14f2ba530b3a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
    #15 0x14f2ba530b3a in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
    #16 0x14f2ba530b3a in tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/arena.cpp:137
    #17 0x14f2ba530b3a in tbb::detail::r1::market::process(rml::job&) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/market.cpp:599
    #18 0x14f2ba532ced in tbb::detail::r1::rml::private_worker::run() /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/private_server.cpp:271
    #19 0x14f2ba532ced in tbb::detail::r1::rml::private_worker::thread_routine(void*) /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-573155027234b8f945d29403a2749d52/tbb-v2021.9.0/src/tbb/private_server.cpp:221
    #20 0x14f2b35e81c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
    #21 0x14f2b5e638d2 in __GI___clone (/lib64/libc.so.6+0x398d2)
cmsbuild commented 2 weeks ago

cms-bot internal usage

cmsbuild commented 2 weeks ago

A new Issue was created by @smuzaffar.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

smuzaffar commented 2 weeks ago

assign reconstruction

cmsbuild commented 2 weeks ago

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

jfernan2 commented 2 weeks ago

FYI @tsusa as Pixel reco contact