cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

Fix `maxHitsInModule` for `SiPixelRecHitFromSoAAlpaka` #46686

Closed AdrianoDee closed 1 week ago

AdrianoDee commented 1 week ago

PR description:

Small fix for SiPixelRecHitFromSoAAlpaka that uses the obsolete pixelClustering::maxHitsInModule() to limit the number of hits in a module. It should be TrackerTraits::maxHitsInModule. The backport to 14_1_X will fix https://github.com/cms-sw/cmssw/issues/46656.

PR validation:

*.402 wfs for HI conditions

AdrianoDee commented 1 week ago

type bug-fix

cmsbuild commented 1 week ago

cms-bot internal usage

cmsbuild commented 1 week ago

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46686/42626

cmsbuild commented 1 week ago

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. @GiacomoSguazzoni, @VinInn, @VourMa, @dkotlins, @felicepantaleo, @ferencek, @gpetruc, @missirol, @mmusich, @mroguljic, @mtosi, @rovere, @threus, @tsusa, @tvami this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

mmusich commented 1 week ago

enable gpu

AdrianoDee commented 1 week ago

test parameters:

AdrianoDee commented 1 week ago

please test

mmusich commented 1 week ago

urgent

AdrianoDee commented 1 week ago

please abort

AdrianoDee commented 1 week ago

test parameters:

AdrianoDee commented 1 week ago

please test

cmsbuild commented 1 week ago

-1

Failed Tests: RelVals-GPU Size: This PR adds an extra 16KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-46ae81/42791/summary.html COMMIT: 2c3f90d5beee66b2ba9f6efb796b89680d85b32b CMSSW: CMSSW_14_2_X_2024-11-12-2300/el8_amd64_gcc12 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46686/42791/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

Comparison Summary

Summary:

AdrianoDee commented 1 week ago

test parameters:

AdrianoDee commented 1 week ago

Ok the failure

src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [768,0,0] Assertion `static_cast<int>(foundClusters) < data.metadata().size()` failed.

needs to be investigated (I think it's related to https://github.com/cms-sw/cmssw/pull/45887) but it is for sure unrelated to this PR.

jfernan2 commented 1 week ago

+1 Signing due to the urgency, @AdrianoDee please create another PR for the new failure at your earliest convenience

cmsbuild commented 1 week ago

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

mmusich commented 1 week ago

needs to be investigated (I think it's related to https://github.com/cms-sw/cmssw/pull/45887) but it is for sure unrelated to this PR.

indeed, I tested in a plain CMSSW_14_2_X_2024-11-12-2300 (without anything checked out on top) with:

runTheMatrix.py --what gpu,upgrade -l 14949.402 -t 4

and I still see

$ more 14949.402_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka/step3_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka.log 
RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry file:step2.root
Step: RAW2DIGI Spec: ['RawToDigi_pixelOnly']
Step: RECO Spec: ['reconstruction_pixelTrackingOnly']
Step: VALIDATION Spec: ['@pixelTrackingOnlyValidation']
@pixelTrackingOnlyValidation in preparing validation
Step: DQM Spec: ['@pixelTrackingOnlyDQM']
customising the process with customiseAlpakaServiceMemoryFilling from HeterogeneousCore/AlpakaServices/customiseAlpakaServiceMemoryFilling
customising the process with setCrossingFrameOn from SimGeneral/MixingModule/fullMixCustomize_cff
Starting  cmsRun  step3_RAW2DIGI_RECO_VALIDATION_DQM.py
%MSG-i ThreadStreamSetup:  (NoModuleName) 13-Nov-2024 14:45:57 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
  - Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
%MSG
%MSG-i CUDAService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
CUDA runtime version 12.4, driver version 12.4, NVIDIA driver version 550.127.05
CUDA device 0: Tesla T4 (sm_75)
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceCudaAsync succesfully initialised.
Found 1 device:
  - Tesla T4
%MSG
13-Nov-2024 14:46:06 CET  Initiating request to open file file:step2.root
13-Nov-2024 14:46:11 CET  Successfully opened file file:step2.root
%MSG-w NonConsumedConditionalModules:  AfterModConstruction  13-Nov-2024 14:46:14 CET pre-events
The following modules were part of some ConditionalTask, but were not
consumed by any other module in any of the Paths to which the ConditionalTask
was associated. Perhaps they should be either removed from the
job, or moved to a Task to make it explicit they are unscheduled.

 siPixelDigis@cpu
 siPixelRecHitsPreSplitting@cpu
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:16.182 CET
Begin processing the 2nd record. Run 1, Event 6, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:16.211 CET
Begin processing the 3rd record. Run 1, Event 7, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 4th record. Run 1, Event 5, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 5th record. Run 1, Event 9, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:18.913 CET
Begin processing the 6th record. Run 1, Event 10, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:18.955 CET
Begin processing the 7th record. Run 1, Event 2, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:18.995 CET
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 9th record. Run 1, Event 4, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 10th record. Run 1, Event 3, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:20.056 CET
src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::in
tegral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParam
s<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [640,0,0] Asserti
on `static_cast<int>(foundClusters) < data.metadata().size()` failed.
makortel commented 1 week ago

Should 14949.402 be added to the set of workflows being tested in IBs?

mmusich commented 1 week ago

Should 14949.402 be added to the set of workflows being tested in IBs?

sounds like it should.

mandrenguyen commented 1 week ago

ignore tests-rejected with ib-failure See discussion above. Failures are unrelated to this PR.

mandrenguyen commented 1 week ago

+1

mmusich commented 1 week ago

sounds like it should.

I opened https://github.com/cms-sw/cmssw/issues/46693 to keep track.