cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.29k forks source link

RECO timing increase in CMSSW_13_1_0_pre3 for `11834.21` #41441

Open clacaputo opened 1 year ago

clacaputo commented 1 year ago

Profiling 11834.21 in CMSSW_13_1_0_pre3, we measured an overall increase of ~1 secs in the reconstruction timing with respect to CMSSW_13_1_0_pre2. The measure has been performed on vocms011 averaging on 5 different measurements:

Investigating the modules' time differences doesn't provide any insight, but it seems a general time increase

The same excluding the first 1 events
  delta/mean delta/orJob     original                   new       module name
  ---------- ------------     --------                  ----       ------------
   +1.985441      +0.01%         0.00 ms/ev ->         0.96 ms/ev ctppsPixelClusters
   +1.915825      +0.00%         0.00 ms/ev ->         0.20 ms/ev ctppsPixelRecHits
   +1.820124      +0.00%         0.01 ms/ev ->         0.12 ms/ev ctppsPixelDigis
   +1.280103      +0.00%         0.01 ms/ev ->         0.02 ms/ev ctppsPixelLocalTracks
   +0.362437      +0.01%         1.97 ms/ev ->         2.85 ms/ev muonsFromCosmics1Leg
   -0.311169      -0.00%         0.68 ms/ev ->         0.50 ms/ev uncleanedOnlyElectronSeeds
   +0.294822      +0.00%         0.71 ms/ev ->         0.96 ms/ev cscSegments
   -0.292857      -0.00%         1.54 ms/ev ->         1.15 ms/ev uncleanedOnlyConversionTrackCandidates
   +0.286512      +0.00%         0.89 ms/ev ->         1.19 ms/ev muonSeededTracksInOut
   +0.285375      +0.08%        29.85 ms/ev ->        39.79 ms/ev cosmicMuons1Leg
   +0.278939      +0.01%         2.62 ms/ev ->         3.47 ms/ev muonSeededTrackCandidatesInOut
   +0.251150      +0.00%         1.04 ms/ev ->         1.34 ms/ev cosmicsVetoTrackCandidates
   +0.246327      +0.00%         1.93 ms/ev ->         2.47 ms/ev cosmicsVeto
   +0.239735      +0.02%        10.24 ms/ev ->        13.03 ms/ev CSCHaloData
   +0.221523      +0.02%         8.91 ms/ev ->        11.13 ms/ev tevMuons
   +0.219048      +0.00%         2.35 ms/ev ->         2.93 ms/ev dt4DCosmicSegments
   +0.218917      +0.00%         0.71 ms/ev ->         0.88 ms/ev muonSeededSeedsInOut
   +0.211306      +0.04%        22.55 ms/ev ->        27.88 ms/ev displacedMuons1stStep
   +0.209846      +0.01%         3.82 ms/ev ->         4.72 ms/ev muonMETValueMapProducer
   +0.207919      +0.01%         2.71 ms/ev ->         3.34 ms/ev dt4DSegments
   +0.203688      +0.00%         0.48 ms/ev ->         0.59 ms/ev vertexMerger
   +0.191276      +0.92%       517.27 ms/ev ->       626.67 ms/ev tobTecStepTrackCandidates
   +0.190319      +0.04%        24.15 ms/ev ->        29.23 ms/ev cosmicMuons
   +0.179414      +0.01%         8.56 ms/ev ->        10.25 ms/ev refittedStandAloneMuons
   +0.174131      +1.94%      1206.68 ms/ev ->      1436.84 ms/ev pixelLessStepTrackCandidates
   +0.172461      +0.01%         5.86 ms/ev ->         6.97 ms/ev generalConversionStepConversionTrackMerger
   +0.172011      +0.01%         6.34 ms/ev ->         7.53 ms/ev globalMuons
   +0.171655      +0.01%         7.39 ms/ev ->         8.78 ms/ev uncleanedOnlyAllConversions
   +0.165928      +0.04%        29.42 ms/ev ->        34.74 ms/ev generalV0Candidates
   +0.163325      +0.01%         7.18 ms/ev ->         8.46 ms/ev standAloneMuons
   +0.157431      +0.03%        17.43 ms/ev ->        20.41 ms/ev particleFlowTmp
   +0.156068      +0.36%       254.61 ms/ev ->       297.71 ms/ev particleFlowDisplacedVertex
   +0.153515      +0.07%        46.55 ms/ev ->        54.29 ms/ev allConversions
   +0.141469      +0.21%       162.93 ms/ev ->       187.73 ms/ev photonConvTrajSeedFromSingleLeg
   +0.136083      +0.10%        84.41 ms/ev ->        96.73 ms/ev tobTecStepSeedsPair
   +0.132370      +0.02%        15.61 ms/ev ->        17.82 ms/ev displacedStandAloneMuons
   +0.130908      +0.15%       129.24 ms/ev ->       147.34 ms/ev particleFlowDisplacedVertexCandidate
   +0.130806      +0.03%        25.80 ms/ev ->        29.41 ms/ev mixedTripletStepTrackCandidates
   +0.130525      +0.01%         9.67 ms/ev ->        11.02 ms/ev electronCkfTrackCandidates
   +0.128512      +0.08%        70.96 ms/ev ->        80.71 ms/ev pixelPairStepTrackCandidates
   +0.127348      +0.34%       299.86 ms/ev ->       340.64 ms/ev AODSIMoutput
   +0.126673      +0.03%        24.88 ms/ev ->        28.24 ms/ev multi5x5SuperClustersWithPreshower
   +0.124052      +0.01%         7.31 ms/ev ->         8.28 ms/ev mixedTripletStepSeedsB
   +0.122517      +0.19%       174.67 ms/ev ->       197.47 ms/ev detachedTripletStepTracks
   +0.120742      +0.28%       260.84 ms/ev ->       294.36 ms/ev detachedTripletStepTrackCandidatesMkFit
   +0.120080      +0.08%        73.21 ms/ev ->        82.56 ms/ev tobTecStepSeedsTripl
   +0.119950      +0.03%        27.17 ms/ev ->        30.64 ms/ev trackVertexArbitrator
   +0.118082      +0.09%        86.73 ms/ev ->        97.62 ms/ev detachedTripletStepSeeds
   +0.117170      +0.02%        19.93 ms/ev ->        22.41 ms/ev inclusiveVertexFinder
   +0.113532      +0.12%       119.65 ms/ev ->       134.06 ms/ev tobTecStepHitTripletsTripl
   +0.112618      +0.02%        23.31 ms/ev ->        26.09 ms/ev JetPlusTrackZSPCorJetAntiKt4
   +0.109289      +0.24%       251.57 ms/ev ->       280.65 ms/ev pixelLessStepSeeds
   +0.107693      +0.01%         6.71 ms/ev ->         7.48 ms/ev pixelPairStepSeedsA
   +0.106936      +0.02%        26.03 ms/ev ->        28.97 ms/ev inclusiveCandidateVertexFinder
   +0.106765      +0.02%        25.99 ms/ev ->        28.92 ms/ev inclusiveCandidateVertexFinderCvsL
   +0.106404      +0.08%        89.80 ms/ev ->        99.90 ms/ev duplicateTrackCandidates
   +0.105956      +0.04%        47.35 ms/ev ->        52.64 ms/ev pixelPairElectronSeeds
   +0.105681      +0.01%         6.93 ms/ev ->         7.70 ms/ev pfTrackElec
   +0.103625      +0.00%         5.31 ms/ev ->         5.89 ms/ev detachedQuadStepSeeds
   +0.101464      +0.08%        91.56 ms/ev ->       101.34 ms/ev earlyGeneralTracks
   +0.099801      +0.01%        16.65 ms/ev ->        18.40 ms/ev pixelPairStepTracks
   +0.098461      +0.47%       538.65 ms/ev ->       594.44 ms/ev pixelLessStepHitTriplets
   +0.098033      +0.01%         7.81 ms/ev ->         8.62 ms/ev lowPtGsfEleCkfTrackCandidates
   +0.097290      +0.11%       124.99 ms/ev ->       137.77 ms/ev detachedTripletStepHitTriplets
   +0.095362      +0.47%       559.21 ms/ev ->       615.21 ms/ev lowPtQuadStepTrackCandidates
   +0.092917      +0.01%        16.44 ms/ev ->        18.04 ms/ev lowPtQuadStepSeeds
   +0.092766      +0.13%       154.79 ms/ev ->       169.85 ms/ev detachedQuadStepHitQuadruplets
   +0.090239      +0.01%        18.48 ms/ev ->        20.23 ms/ev hpsPFTauBasicDiscriminators
   +0.089340      +0.13%       161.23 ms/ev ->       176.31 ms/ev convTrackCandidates
   +0.089327      +0.02%        30.42 ms/ev ->        33.26 ms/ev candidateVertexArbitrator
   +0.089316      +0.01%        18.37 ms/ev ->        20.08 ms/ev hpsPFTauBasicDiscriminatorsdR03
   +0.087842      +0.01%         7.58 ms/ev ->         8.27 ms/ev lowPtGsfElePfGsfTracks
   +0.087838      +0.01%        16.48 ms/ev ->        17.99 ms/ev detachedTripletStep
   +0.087651      +0.00%         4.85 ms/ev ->         5.30 ms/ev tobTecStepHitDoubletsPair
   +0.087399      +0.03%        38.94 ms/ev ->        42.50 ms/ev candidateVertexArbitratorCvsL
   +0.086255      +0.03%        40.44 ms/ev ->        44.09 ms/ev tobTecStepTracks
   +0.085945      +0.08%       104.05 ms/ev ->       113.39 ms/ev particleFlowBlock
   +0.085715      +0.02%        29.01 ms/ev ->        31.61 ms/ev combinatoricRecoTaus
   +0.085231      +0.01%         9.92 ms/ev ->        10.81 ms/ev highPtTripletStepSeeds
   +0.083370      +0.31%       420.18 ms/ev ->       456.73 ms/ev lowPtTripletStepTrackCandidates
   +0.082580      +0.01%         9.13 ms/ev ->         9.91 ms/ev detachedQuadStepTrackCandidatesMkFit
   +0.080954      +0.03%        47.83 ms/ev ->        51.86 ms/ev ak4JetTracksAssociatorAtVertexJPT
   +0.078335      +0.01%        17.67 ms/ev ->        19.11 ms/ev mixedTripletStepHitTripletsB
   +0.077665      +0.01%        21.56 ms/ev ->        23.30 ms/ev pixelLessStepHitDoublets
   +0.077360      +0.01%        12.56 ms/ev ->        13.57 ms/ev initialStepSeedsPreSplitting
   +0.076899      +0.01%        13.45 ms/ev ->        14.53 ms/ev lowPtTripletStepSeeds
   +0.076146      +0.01%         9.91 ms/ev ->        10.70 ms/ev offlinePrimaryVerticesWithBS
   +0.076097      +0.01%        10.55 ms/ev ->        11.39 ms/ev ak4PFJetsRecoTauChargedHadrons
   +0.074927      +0.00%         6.98 ms/ev ->         7.52 ms/ev photonIDValueMaps
   +0.074451      +0.01%        21.96 ms/ev ->        23.66 ms/ev detachedQuadStepTracks
   +0.074306      +0.01%        11.52 ms/ev ->        12.41 ms/ev pfPileUpJME
   +0.073873      +0.01%         9.93 ms/ev ->        10.69 ms/ev offlinePrimaryVertices
   +0.073190      +0.01%        11.69 ms/ev ->        12.58 ms/ev pfPileUpIso
   +0.072777      +0.01%        13.06 ms/ev ->        14.05 ms/ev convStepTracks
   +0.072221      +0.01%        13.59 ms/ev ->        14.60 ms/ev detachedQuadStepHitDoublets
   +0.072016      +0.00%         5.84 ms/ev ->         6.28 ms/ev tobTecStep
   +0.071378      +0.04%        65.52 ms/ev ->        70.37 ms/ev highPtTripletStepTrackCandidatesMkFit
   +0.070545      +0.03%        44.68 ms/ev ->        47.94 ms/ev electronGsfTracks
   +0.069919      +0.01%        12.18 ms/ev ->        13.06 ms/ev mergedDuplicateTracks
   +0.069757      +0.01%        21.15 ms/ev ->        22.68 ms/ev mixedTripletStepTracks
   +0.067833      +0.01%         9.69 ms/ev ->        10.37 ms/ev detachedTripletStepHitDoublets
   +0.065324      +0.01%        13.31 ms/ev ->        14.21 ms/ev initialStepSeeds
   +0.064083      +0.02%        31.15 ms/ev ->        33.21 ms/ev gedPhotonsTmp
   +0.063348      +0.02%        45.06 ms/ev ->        48.01 ms/ev trackerDrivenElectronSeeds
   +0.063170      +0.01%        10.56 ms/ev ->        11.25 ms/ev mixedTripletStepHitTripletsA
   +0.062975      +0.00%         5.88 ms/ev ->         6.27 ms/ev dedxHarmonic2
   +0.062583      +0.02%        45.40 ms/ev ->        48.33 ms/ev stripPairElectronSeeds
   +0.060781      +0.01%        28.25 ms/ev ->        30.02 ms/ev ecalDrivenElectronSeeds
   +0.060088      +0.01%         9.65 ms/ev ->        10.24 ms/ev pfTrack
   +0.058864      +0.00%         5.80 ms/ev ->         6.16 ms/ev preDuplicateMergingGeneralTracks
   +0.057466      +0.05%       108.59 ms/ev ->       115.02 ms/ev pixelLessStepTracks
   +0.057319      +0.01%        29.54 ms/ev ->        31.29 ms/ev pfImpactParameterTagInfos
   +0.056630      +0.03%        64.37 ms/ev ->        68.12 ms/ev highPtTripletStepTracks
   +0.056441      +0.02%        45.38 ms/ev ->        48.01 ms/ev lowPtGsfElectronsPreRegression
   +0.055279      +0.05%       114.60 ms/ev ->       121.11 ms/ev jetCoreRegionalStepTrackCandidates
   +0.054866      +0.00%         9.39 ms/ev ->         9.92 ms/ev lowPtGsfElePfTracks
   +0.054720      +0.06%       129.45 ms/ev ->       136.74 ms/ev lowPtQuadStepTracks
   +0.052853      +0.18%       392.10 ms/ev ->       413.38 ms/ev lowPtGsfElectronSeeds
   +0.052746      +0.00%         9.83 ms/ev ->        10.37 ms/ev uncleanedOnlyPfTrack
   +0.051821      +0.08%       172.94 ms/ev ->       182.13 ms/ev initialStepTracksPreSplitting
   +0.051430      +0.08%       174.13 ms/ev ->       183.32 ms/ev initialStepTracks
   +0.051286      +0.01%        22.76 ms/ev ->        23.95 ms/ev gedGsfElectronsTmp
   -0.050247      -0.00%         6.49 ms/ev ->         6.17 ms/ev ak8PFJetsPuppiSoftDrop
  ---------- ------------     --------                  ----       ------------
Job total:  11.8925 s/ev ==> 13.0141 s/ev

Investigating igprof output shows a 1% increase in cms::CkfTrackCandidateMakerBase::produceBase as you can see here.

NOTA: For 136.889 (T0-like condition on Data), the time increase is not so high 8.09688 s/ev ==> 8.34514 s/ev link and it is clearly related to pfParticleNetFromMiniAODAK4CHS{Central,Forward}*

clacaputo commented 1 year ago

assign reconstruction

cmsbuild commented 1 year ago

New categories assigned: reconstruction

@mandrenguyen,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 year ago

A new Issue was created by @clacaputo Claudio Caputo.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

clacaputo commented 1 year ago

From a first investigation on the reconstruction-related PRs integrated into pre3, there is no clear responsible

slava77 commented 1 year ago

From a first investigation on the reconstruction-related PRs integrated into pre3, there is no clear responsible

the summary for the data wf seems to clearly point to pfParticleNetFromMiniAODAK4CHS{Central,Forward}* It may be clarified in the issue description; now it sounds like the origins are similar.

slava77 commented 1 year ago

apparently your workflow redoes gensim from scratch. I see some difference in the number of sim tracks but at least the mean is within 0.2% (red is pre2; plot is from step1.root on eos)

it looks like minbias has BeamSpotObjects_Realistic25ns_13p6TeVCollisions_EOY2022_v1_mc while the pre3 GT has BeamSpotObjects_Realistic25ns_13p6TeVCollisions_EOY2022_v2_mc. The changes are relatively minor (v1 has a wider y), ~but eventually something to get in sync.~ (this is intentional/unavoidable and is OK, apparently) This change happened from pre2->pre3

https://github.com/cms-sw/cmssw/releases/CMSSW_13_1_0_pre3 has changes in geant

Can you run another variant of pre3 using step1 inputs from pre2 as well as using 131X_mcRun3_2022_realistic_v1? to decouple beam spot and geant effects.

slava77 commented 1 year ago

This change happened from pre2->pre3

the change in beam spot in the GT since pre2 is actually more significant than the v1 and v2 I mentioned above for minBias (the minbias in pre3 is consistent with the pre3 GT).

Can you run another variant of pre3 using step1 inputs from pre2 as well as using 131X_mcRun3_2022_realistic_v1? to decouple beam spot and geant effects.

and older minBias as well

clacaputo commented 1 year ago

Can you run another variant of pre3 using step1 inputs from pre2 as well as using 131X_mcRun3_2022_realistic_v1? to decouple beam spot and geant effects.

Hi @slava77 , thanks for the suggestion. Talking with @mandrenguyen we arrived at the same conclusion. I'll run it

mmusich commented 1 year ago

the change in beam spot in the GT since pre2 is actually more significant than the v1 and v2 I mentioned above for minBias (the minbias in pre3 is consistent with the pre3 GT).

for the record, here's the change.

image

mainly moving the z coordinate of the ellipsoid center towards 0 and related uncertainty. Descriptions at https://github.com/cms-sw/cmssw/pull/41123#issue-1634394648

clacaputo commented 1 year ago

Can you run another variant of pre3 using step1 inputs from pre2 as well as using 131X_mcRun3_2022_realistic_v1? to decouple beam spot and geant effects.

I did the check running pre3 using step1 inputs from pre2 as well as using131X_mcRun3_2022_realistic_v1`

Timing:

Modules' time differences pre2 VS pre3_with_pre2BS

The same excluding the first 1 events
  delta/mean delta/orJob     original                   new       module name
  ---------- ------------     --------                  ----       ------------
   -0.051121      -0.06%       152.05 ms/ev ->       144.47 ms/ev trackExtrapolator
   +0.050967      +0.23%       517.27 ms/ev ->       544.32 ms/ev tobTecStepTrackCandidates
  ---------- ------------     --------                  ----       ------------

It is clear that the reconstruction time increase in pre3 is coming from the new BeamSpot conditions, introduced by #41123

francescobrivio commented 1 year ago

Hi @clacaputo

It is clear that the reconstruction time increase in pre3 is coming from the new BeamSpot conditions, introduced by https://github.com/cms-sw/cmssw/pull/41123

Marco pointed me to this issue: indeed the new BeamSpot was introduced because it reflects better the Z position of the Beamspot in the 2022 data (see this presentation for a few more details). So the new measure is correct. Interesting that 1 cm in the BeamSpot Z position has such a large effect on reco-timing!

Just to be sure we (me & Marco) took a look at the relvals introduced in the same PR to mach the new BeamSpot ('CMSSW_13_1_0_pre1-130X_mcRun3_2022_realistic_withNewBSFromEOY2022Data_v2_RV186-v') and we concluded that they were correctly generated:

mandrenguyen commented 1 year ago

Given that the increase shows most prominently in the TobTec step, which I guess sort of makes sense for a shift in the z coordinate of the beamspot, we probably ought to check whether the fraction of tracks being reco'd and selected with highPurity in that iteration is changing.

mmusich commented 1 year ago

we probably ought to check whether the fraction of tracks being reco'd and selected with highPurity in that iteration is changing.

that's precisely what I wanted to check on the RelVal samples, while on paper everything seems to be correct, I am a bit puzzled by the results in the release validation. I checked:

The same thing is observed in the samples with PU:

performance:

I would tend to conclude that the samples in the RelVals were generated with recycled GEN-SIM (which in the case at hand doesn't make sense :( ).

slava77 commented 1 year ago

I did the check running pre3 using step1 inputs from pre2 as well as using131X_mcRun3_2022_realistic_v1`

did you use the pre2 minbias GEN-SIM as well?

clacaputo commented 1 year ago

did you use the pre2 minbias GEN-SIM as well?

~Shouldn't be done automatically using pre2 step2 as input of pre3 step3?~ I'm running pre3 step3 again

slava77 commented 1 year ago

~Shouldn't be done automatically using pre2 step2 as input of pre3 step3?~

it's probably good enough in this case (only BS changed in GT and hopefully digi formats didn't change)

clacaputo commented 1 year ago

~Shouldn't be done automatically using pre2 step2 as input of pre3 step3?~

it's probably good enough in this case (only BS changed in GT and hopefully digi formats didn't change)

pre3 step3 tested with:

Time measured: 12.4 s/ev