cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

[Run3 PromptReco] 390 MB in cut/expression parser #46493

Open makortel opened 3 days ago

makortel commented 3 days ago

From https://github.com/cms-sw/cmssw/issues/46040#issuecomment-2420368633 the cut/expression parser uses 390 MB (constant cost, i.e. gets amortized across threads). The top 10 modules with the largest contributions are

Addressing already the top 3 would save about 286 MB.

cmsbuild commented 3 days ago

cms-bot internal usage

cmsbuild commented 3 days ago

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 3 days ago

assign reconstruction, xpog, dqm

cmsbuild commented 3 days ago

New categories assigned: reconstruction,xpog,dqm

@antoniovagnerini,@ftorrresd,@hqucms,@jfernan2,@mandrenguyen,@nothingface0,@rvenditti,@syuvivida,@tjavaid you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 commented 3 days ago

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

makortel commented 3 days ago
  • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)

I believe this corresponds to MuonSelector https://github.com/cms-sw/cmssw/blob/4635bff7d6bee3aaee115ae16ec4c540ca76d7b0/CommonTools/RecoAlgos/plugins/MuonSelector.cc#L32-L35

The configuration defines 25 of them, of which 14 are not constructed in the C++ (i.e. these modules are not part of any Task/Sequence/Path/EndPath that is associated with the Schedule)

ALCARECOSiPixelCalSingleMuonTightGoodMuons
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons
ALCARECOTkAlDiMuonGoodMuons
ALCARECOTkAlDiMuonRelCombIsoMuons
ALCARECOTkAlJpsiMuMuGoodMuons (not constructed)
ALCARECOTkAlMuonIsolatedGoodMuons
ALCARECOTkAlMuonIsolatedRelCombIsoMuons
ALCARECOTkAlUpsilonMuMuGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuPAGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuRelCombIsoMuons (not constructed)
ALCARECOTkAlZMuMuGoodMuons
ALCARECOTkAlZMuMuPAGoodMuons (not constructed)
ALCARECOTkAlZMuMuRelCombIsoMuons
ZmmgLeadingMuons (not constructed)
ZmmgTrailingMuons (not constructed)
gemDQMStaMuons
gemDQMTightGlbMuons
muonSelection (not constructed)
muonSelectorForEMu (not constructed)
muonSelectorForZMM (not constructed)
muonSelectorForZMMPA (not constructed)
muonsPt10
selectedMuons (not constructed)
selectedMuonsIso (not constructed)
tightMuonsForPbPbZMuSkim (not constructed)

Looking next the actual cut strings in the 11 modules that are constructed

ALCARECOSiPixelCalSingleMuonTightGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlDiMuonGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlDiMuonRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlMuonIsolatedGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlMuonIsolatedRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlZMuMuGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlZMuMuRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
gemDQMStaMuons isStandAloneMuon&& outerTrack.isNonnull
gemDQMTightGlbMuons isGlobalMuon&& globalTrack.isNonnull&& passed(\'CutBasedIdTight\')
muonsPt10 isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &pt > 10

There is a lot of repetition: one group of 4 modules and another group of 4 modules have exactly the same cut string.

Would it be feasible to replace these with modules that express the structure of the selection in C++ code instead?

makortel commented 3 days ago

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

Possibly. I had a vague recollection of such "constant cost" of this infrastructure, but it's source wasn't evident from the profile (maybe I just didn't dig deeply enough into Cling's call stack).

I'd guess the StringObjectFunction<l1t::EGamma, false> in BXVectorSimpleFlatTableProducer<l1t::EGamma> would be a good candidate for that "constant cost" as it takes 90 MB, while StringCutObjectSelector<l1t::EGamma, false> in the same module takes 448 B.

There is nevertheless potential for ~200 MB reduction by addressing the "next largest" consumers.

makortel commented 3 days ago

assign alca

cmsbuild commented 3 days ago

New categories assigned: alca

@atpathak,@consuegs,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 3 days ago

@cms-sw/alca-l2 By the way, of the 8 ALCARECO* MuonSelectors listed above,

This duplication has both memory and runtime cost.

makortel commented 3 days ago
  • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)

I believe this one corresponds to PATGenericParticleSelector https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/PatAlgos/plugins/PATObjectSelector.cc#L249-L250

The configuration contains 2 such modules, of 1 is not constructed because it is not in any Task/Sequence/Path/EndPath associated to the Schedule

looseIsoMuonsForPbPbZMuSkim (not constructed)
looseIsoMuonsForZMuSkim

The cut string is same in both (userIsolation(\'pat::TrackIso\')/pt)<0.4 that looks straightforward to be potentially replaced with a C++-coded selection structure.

Although if this is impacted with the "first call to boost::spirit has a constant cost", it might not be that important (but looks simple nevertheless).

makortel commented 3 days ago

assign DPGAnalysis/Skims

makortel commented 3 days ago

type performance-improvements

cmsbuild commented 3 days ago

New categories assigned: pdmv

@AdrianoDee,@kskovpen,@miquork,@sunilUIET you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 3 days ago
  • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)

This is SimpleTriggerL1EGFlatTableProducer https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/NanoAOD/plugins/SimpleL1TFlatTableProducerPlugins.cc#L4

that uses

    cut = cms.string('pt>=10'),
    ...
            expr = cms.string('eta'),
            ...
            expr = cms.string('hwIso()'),
            ...
            expr = cms.string('phi'),
            ...
            expr = cms.string('pt'),

that look like much simpler mapping from strings to member functions than full expression parser could be feasible.

makortel commented 3 days ago
  • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)

I believe this is SimpleBeamspotFlatTableProducer https://github.com/cms-sw/cmssw/blob/55fd97727507dd74a4020c9264f5f66118409151/PhysicsTools/NanoAOD/plugins/SimpleFlatTableProducerPlugins.cc#L49

The configuration has 2 modules of this type, of which only 1 is actually used in the job

simpleBeamspotFlatTableProducer (not constructed)
beamSpotTable

The beamSpotTable accesses

            expr = cms.string('sigmaZ()'),
            ...
            expr = cms.string('sigmaZ0Error()'),
            ...
            expr = cms.string('type()'),
            ...
            expr = cms.string('position().z()'),
            ...
            expr = cms.string('z0Error()'),

that also looks like a much simpler mapping from strings to member functions than the full expression parser could be feasible.