Open felicepantaleo opened 1 year ago
A new Issue was created by @felicepantaleo Felice Pantaleo.
@antoniovilela, @makortel, @Dr15Jones, @rappoccio, @smuzaffar, @sextonkennedy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core,upgrade
New categories assigned: core,upgrade
@Dr15Jones,@AdrianoDee,@makortel,@smuzaffar,@srimanob you have been requested to review this Pull request/Issue and eventually sign? Thanks
So I used my recent changes to the Tracer service to gather statistics on your job. What I found was the following break down
51s parsing python
8.4s setting up source
19.7s constructing the 2200 modules
0.05s in begin job
0.05s in begin stream
143s in CondDBESSource to find the conditions for the first Run
0.0003s source reading begin run
24s in global Begin Run
1.8s in dqmCSCClient
2.1s in XMLIdealGeometryESSource
0.9s in hcalParameters
1.6s in EcalBarrelGeometryEP
0.9s in hgcalEEParametersInitialize
1.1s in hgcalHESiParametersInitialize
1.2s in gemGeometry
2.7s in TrackerAlignmentRcd
1.0s in ecalMonitorTask
4.2s in VolumeBasedMagneticFieldESProducer
1.0s in trackerGeometry
0.2s in CondDBESSource to find conditions for first lumi
50s in stream begin Run
48.3s in mtdNumberingGeometry
0.002s source reading begin lumi
0.02s in global begin lumi
0.2s in stream begin lumi
0.004s source reading first event
317s from start to start processing first event
So the obvious speed ups would be to
.dumpPython
or one where you pickle to results and always re-read from the pickle file.Thanks very much @Dr15Jones for analysis. FYI @cms-sw/mtd-dpg-l2 @fabiocos on mtdNumberingGeometry
I investigated the first 3 steps of the workflow with VTune (as @felicepantaleo told me it felt like all the steps have long startup time). On step 1 (GEN-SIM) the job divides clearly into three phases
The first phase is the overall job startup (shown in the top thread), taking about 102 seconds wall clock time, and 85 seconds of CPU time. The CPU time divides into
EventProcessor
(i.e. module construction and all that)MuonOffsetESProducer
HcalSimParametersESModule
VolumeBasedMagneticFieldESProducerFromDB
XMLIdealGeometryESSource
HGCalParametersESModule
EcalSimParametersESModule
Here is flame graph of what happens in MuonOffsetESProducer
The question is then, does it really need to be this expensive? @cms-sw/geometry-l2
Second phase (middle thread in the timeline) is the Geant4 initialization in the OscarMTMasterThread
, taking about 50 seconds. I don't know if there is anything that could be easily done there, but here is the flame graph of that anyway @cms-sw/simulation-l2
Third phase (mostly in the bottom thread in the timeline) is the actual event processing (most time spent in Geant4, so shows in a separate non-TBB thread), taking about 45 seconds for 10 events.
assign geometry
New categories assigned: geometry
@Dr15Jones,@civanch,@bsunanda,@makortel,@mdhildreth you have been requested to review this Pull request/Issue and eventually sign? Thanks
The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time.
The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks.
The startup phase (72 seconds CPU time) divides into
EventProcessor
constructor
StreamSchedule
constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all thatMaker::makeModule
, of which 8 seconds in creating edm::stream
modules. This 8 seconds increases with the number of streams, but is run serially!
L1TPFCaloProducer
constructorl1tpf::corrector::init_()
DTTrigPhase2Prod
constructorGlobalCoordsObtainer::generate_luts()
l1tpf::PFClusterProducerFromHGC3DClusters
constructorl1tpf::HGC3DClusterEgID
constructorMixingModule
constructor, rest is below that@cms-sw/l1-l2 Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules?
Next is beginRun transition (including most, if not all, of EventSetup), about 100 seconds of CPU time, divided into
MTDGeometricTimingDetESModule::produce()
(which @Dr15Jones also noted above)magneticfield::VolumeBasedMagneticFieldESProducerFromDB::produce()
XMLIdealGeometryESSource::produceGeom()
HGCalParametersESModule::produce()
TrackerDigiGeometryESModule::produce()
CaloGeometryEP<EcalBarrelGeometry, DDCompactView>::produceAligned()
DTGeometryESModule::produce()
CSCGeometryESModule::produce()
GEMGeometryESModule::produce()
HcalParametersESModule::produce()
HGCalGeometryESProducer::produce()
HcalTPGCoderULUT::produce()
RPCGeometryESModule::produce()
MuonGeometryConstantsESModule::produce()
trklet::TrackletEventProcessor::init()
Then comes the event data processing, 63 seconds for 10 events
MixingModule
HGCDigiProducer::finalizeEvent()
HcalDigiProducer::finalizeEvent()
EcalDigiProducer::finalizeEvent()
assign l1
New categories assigned: l1
@epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks
The step 3 (RECO+PAT+VALIDATION+DQM) took 290 seconds wall clock time with 260 seconds of CPU time.
The configuration defines 5431(!) EDModules, 483 ESModules, 72 (End)Paths, 1291 Sequences, and 758 Tasks.
The startup phase took about 170 seconds wall clock time (150 seconds CPU time), divided into (in CPU time)
miniAOD_customizeAllMC()
customization functiontoModify()
_add_jetsPuppi()
applyDeepBtagging()
setupBTagging()
addToProcessAndTask()
_add_metPuppi()
runMetCorAndUncForMiniAODProduction()
process.foo = cms.EDModule(...)
, where the process.foo
already exists. In this case the framework goes to replace the module in all Tasks/Sequence/Schedule
_replaceInTasks()
_replaceInSequences()
_replaceInSchedule()
_delattrFromSetattr()
EventProcessor
Schedule
edm::Factory::makeModule()
edm::stream
EDModules. This 5 seconds increases with the number of streams, but is run serially!
MuonIdProducer
constructor, in reading ROOT file via MuonCaloCompatibility::configure()
@cms-sw/xpog-l2 I found that the miniAOD_customizeAllMC()
being expensive was reported already in https://github.com/cms-sw/cmssw/issues/20220, maybe it would be finally time to speed it up?
The beginRun transition took about 93 seconds of wall clock time (87 seconds CPU time), divided into (in CPU time)
MTDGeometricTimingDetESModule::produce()
(also noted earlier)XMLIdealGeometryESSource::produceGeom()
magneticfield::VolumeBasedMagneticFieldESProducerFromDB::produce()
HGCalParametersESModule::produce()
CaloGeometryEP<EcalBarrelGeometry, DDCompactView>::produceAligned()
GEMGeometryESModule::produce()
TrackerDigiGeometryESModule::produce()
HcalParametersESModule::produce()
HGCalGeometryESProducer::produce()
DTGeometryESModule::produce()
edm::one
EDModulescscdqm::Dispatcher::book()
EcalDQMonitorTask::bookHistograms()
edm::stream
EDModulesedm::global
EDModulesEvent data processing and shutdown took about 23 seconds in wall clock time (22.5 seconds in CPU time), divided into (in CPU time)
SiPixelGainCalibrationOffline
in SiPixelGainCalibrationOfflineRcd
GBRForestD
in GBRDWrapperRcd
GBRForest
in GBRWrapperRcd
edm::one
modules (looks like DQM)edm::global
and edm::stream
modules (i.e. reconstruction etc, nothing seems to really smell here)DelayedReader
(i.e. reading input data)assign xpog
New categories assigned: xpog
@vlimant,@simonepigazzini you have been requested to review this Pull request/Issue and eventually sign? Thanks
assign db
For the cost of
143s in CondDBESSource to find the conditions for the first Run
in https://github.com/cms-sw/cmssw/issues/43062#issuecomment-1771705646
New categories assigned: db
@francescobrivio,@saumyaphor4252,@perrotta,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks
+core
I think our part (in the analysis) is done
re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry? [which might be much faster if it were in the db]
As a test I did
> time python3 -c 'import step3_RAW2DIGI_RECO_RECOSIM_PAT_VALIDATION_DQM'
45.100u 1.567s 0:49.83 93.6% 0+0k 24+0io 0pf+0w
Then in python I did
> python3 -i step3_RAW2DIGI_RECO_RECOSIM_PAT_VALIDATION_DQM.py
>>> import pickle
>>> pickle.dump(process, open('step3.pkl', 'wb'))
and then
[cdj@cmslpc-el8-heavy01 24896.0_CloseByPGun_CE_E_Front_120um+2026D98]$ time python3 -c 'import pickle; process = pickle.load(open("step3.pkl", "rb"));'
1.833u 0.153s 0:02.16 91.6% 0+0k 0+0io 0pf+0w
so using a pickle file is 25x faster
@davidlange6
re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry?
it also seems that Phase 2 is still using the old DD instead of DD4Hep.
I believe that is true - (so startup will get slower with DD4Hep I guess)
@Dr15Jones asked for the flame graph for MTDGeometricTimingDetESModule::produce()
looking at some of the others
putting a
theSiPixelGainCalibrationOffline.reserve(detUnitDimensions.first*(detUnitDimensions.second + detUnitDimensions.first/80);
probably helps a lot (I don't really understand the resize(...) followed by a memcpy going on, so there might be another factor to gain in this code
@davidlange6
re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry?
it also seems that Phase 2 is still using the old DD instead of DD4Hep.
Do we really gain something in performance with dd4hep? It was also a question I get from CHEP, but I never have a clear result of gaining.
By the way, we are trying to move. Last validation still see few issues, and we plan to do again with coming 13_3_0_pre4
@srimanob
Do we really gain something in performance with dd4hep?
My take is we'd rather spend time optimizing code using DD4Hep then spend the time on the obsolete DD.
here cmssw/CalibTracker/SiPixelESProducers/plugins/SiPixelFakeGainOfflineESSource.cc
This should not even be executed in phase2
here cmssw/CalibTracker/SiPixelESProducers/plugins/SiPixelFakeGainOfflineESSource.cc
This should not even be executed in phase2
I confirm that in this workflow the SiPixelGainCalibrationOffline
comes from the CondDB via cond::persistency::PayloadProxy<SiPixelGainCalibrationOffline>::loadPayload()
etc.
I confirm that in this workflow the SiPixelGainCalibrationOffline comes from the CondDB
I thought we cleaned that here https://github.com/cms-sw/cmssw/pull/42794
I confirm that in this workflow the SiPixelGainCalibrationOffline comes from the CondDB
I thought we cleaned that here #42794
The reported behavior was in CMSSW_13_2_0_pre3 (the job used 131X_mcRun4_realistic_v6
Global Tag), i.e. before that cleanup PR. So one message to @felicepantaleo and @waredjeb would be to move to more recent CMSSW.
The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time.
The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks.
The startup phase (72 seconds CPU time) divides into
- 5 seconds of reading configuration
46 seconds in
EventProcessor
constructor
27 seconds in
StreamSchedule
constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all that12 seconds spent in
Maker::makeModule
, of which 8 seconds in creatingedm::stream
modules. This 8 seconds increases with the number of streams, but is run serially!
3 seconds in
L1TPFCaloProducer
constructorMost of the time seems to be spent in reading data from a ROOT file via
l1tpf::corrector::init_()
2 seconds in
DTTrigPhase2Prod
constructorMost of the time seems to be spent in
GlobalCoordsObtainer::generate_luts()
2 seconds in
l1tpf::PFClusterProducerFromHGC3DClusters
constructorMost of the time seems to be spent in creating cut parser, and creating expression parser in
l1tpf::HGC3DClusterEgID
constructor0.3 seconds in
MixingModule
constructor, rest is below that1 second constructing the source
@cms-sw/l1-l2 Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules?
@gpetruc
Yours is the name I recall in connection with the correlator team and Phase2L1ParticleFlow (That is correct right? My apologies if not). From my look at the L1TPFCaloProducer
code, 3 corrector
s get produced from a string, an int, and a debug bool: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/plugins/L1TPFCaloProducer.cc#L73-L77
Which calls this constructor here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L34-L38, calling init_
with two strings and two bools here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L52-L83
This is mostly file checks which eventually calls another init_
here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L85-L144 and if the emulation flag is set, it calls another initEmulation_
here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L146-L175
Notably, each of the init functions does some loop through TFile
keys, which would be my first instinct to check for slow down in this constructor.
Would it be possible to ask if you or one of the people responsible for Phase2L1ParticleFlow could take a look at this and see if where the slowdown occurs and if it could be reworked?
@NTrevisani
You are listed as having most recently added this line: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/plugins/DTTrigPhase2Prod.cc#L197 to DTTriggerPhase2Prod
Which in turn calls a for loop doing a long list of mathematical functions here (listed with the atan calculation): https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/src/GlobalCoordsObtainer.cc#L43-L195
which seems a primary candidate for slowdown in this function. Could you, or someone responsible for this code look into the performance of this, and see if it could be reworked? If not, would it maybe be possible to cache or look up the results of these math functions/loops? At a quick pass these seem to be calculated against some constants.
@rekovic & @cerminar
You are listed as authors on PFClusterProducerFromHGC3DClusters
and HGC3DClusterEgID
. This one I cannot find a likely candidate for slow-down on, other than perhaps the instantiation of the TMVA
and the creation of another corrector
(see the first section).
Could you, or someone responsible for this code look into the performance of this and see if it could be reworked?
Hi,
I can have a look but not with terribly high priority - after all, we're discussing shaving off seconds of initialization, once one runs a workflow with more events it becomes irrelevant. The two PF modules mentioned have to load a bunch of calibration histograms (in several bins of eta and em fraction), and BDTs for cluster identification, so they do need some time. Also, eventually both features will anyway have to be replaced with bitwise-accurate emulators (timescale of some months, at most a year)
Giovanni
Il Ven 20 Ott 2023, 17:54 Andrew Loeliger @.***> ha scritto:
The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time.
The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks.
The startup phase (72 seconds CPU time) divides into
-
5 seconds of reading configuration
46 seconds in EventProcessor constructor
27 seconds in StreamSchedule constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all that - 12 seconds spent in Maker::makeModule, of which 8 seconds in creating edm::stream modules. *This 8 seconds increases with the number of streams, but is run serially!* - 3 seconds in L1TPFCaloProducer constructor - Most of the time seems to be spent in reading data from a ROOT file via l1tpf::corrector::init_() - 2 seconds in DTTrigPhase2Prod constructor - Most of the time seems to be spent in GlobalCoordsObtainer::generate_luts() - 2 seconds in l1tpf::PFClusterProducerFromHGC3DClusters constructor - Most of the time seems to be spent in creating cut parser, and creating expression parser in l1tpf::HGC3DClusterEgID constructor - 0.3 seconds in MixingModule constructor, rest is below that - 1 second constructing the source
@cms-sw/l1-l2 https://github.com/orgs/cms-sw/teams/l1-l2 Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules?
@gpetruc https://github.com/gpetruc
Yours is the name I recall in connection with the correlator team and Phase2L1ParticleFlow (That is correct right? My apologies if not). From my look at the L1TPFCaloProducer code, 3 correctors get produced from a string, an int, and a debug bool: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/plugins/L1TPFCaloProducer.cc#L73-L77
Which calls this constructor here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L34-L38, calling init with two strings and two bools here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L52-L83 This is mostly file checks which eventually calls another init here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L85-L144 and if the emulation flag is set, it calls another initEmulation_ here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L146-L175
Notably, each of the init functions does some loop through TFile keys, which would be my first instinct to check for slow down in this constructor.
Would it be possible to ask if you or one of the people responsible for Phase2L1ParticleFlow could take a look at this and see if where the slowdown occurs and if it could be reworked?
@NTrevisani https://github.com/NTrevisani
You are listed as having most recently added this line: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/plugins/DTTrigPhase2Prod.cc#L197 to DTTriggerPhase2Prod
Which in turn calls a for loop doing a long list of mathematical functions here (listed with the atan calculation): https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/src/GlobalCoordsObtainer.cc#L43-L195
which seems a primary candidate for slowdown in this function. Could you, or someone responsible for this code look into the performance of this, and see if it could be reworked? If not, would it maybe be possible to cache or look up the results of these math functions/loops? At a quick pass these seem to be calculated against some constants.
@rekovic https://github.com/rekovic & @cerminar https://github.com/cerminar
You are listed as authors on PFClusterProducerFromHGC3DClusters and HGC3DClusterEgID. This one I cannot find a likely candidate for slow-down on, other than perhaps the instantiation of the TMVA and the creation of another corrector (see the first section).
Could you, or someone responsible for this code look into the performance of this and see if it could be reworked?
— Reply to this email directly, view it on GitHub https://github.com/cms-sw/cmssw/issues/43062#issuecomment-1772993772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDNWR65YPAQI4UAPYD6OXTYAKND7AVCNFSM6AAAAAA6G3HLI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZSHE4TGNZXGI . You are receiving this because you were mentioned.Message ID: @.***>
it is on our to-do list to review how PAT is constructed and simplify it together with dropping miniAOD_customizeAllMC
As an additional test, I dumped the data associated to the global tag used by the job into an HDF5 file by doing
conddb2hdf5.py --compressor zlib --output 133X_mcRun4_realistic_v1.h5conf 133X_mcRun4_realistic_v1
As the tool only handles full global tags and not individual tags, I changed the cmsRun configuration to read the global tag from the HDF5 file but had the individual tags still come from CondDBESSource. The change to the configuration was the addition of the following lines
process.GlobalTag.globaltag = ""
process.H5Tag = cms.ESSource("CondHDF5ESSource", filename = cms.untracked.string("133X_mcRun4_realistic_v1.h5conf"), globalTag = cms.string("133X_mcRun4_realistic_v1"),
excludeRecords = cms.vstring("SiPixelGenErrorDBObjectRcd", "SiPixelLorentzAngleRcd",
"SiPixelLorentzAngleSimRcd","SiPixelTemplateDBObjectRcd",
"TrackerAlignmentErrorExtendedRcd",
"TrackerAlignmentRcd",
"TrackerSurfaceDeformationRcd"))
Running the original job at FNAL, it took 140s for the job to retrieve all the IOVs during the begin Run phase. With this change, it now takes 1.5s.
So for local tests one can first dump the global tag and then run.
since which version H5Tag
is supported?
The CondCore/CondHDF5ESSource package was first introduced in CMSSW_13_3_0_pre2. See https://github.com/cms-sw/cmssw/pull/42431.
@Dr15Jones - what is the role of excludeRecords here? Just an ability to skip known to be not needed records? Is that responsible for much of the (very interesting..) speedup
what is the role of excludeRecords here? Just an ability to skip known to be not needed records? Is that responsible for much of the (very interesting..) speedup
For the configuration given, the CondDBESSource overrides the GlobalTag for those records and gives explicit tags to be used (see below). The CondHDF5ESSource (and its accompanying scripts) only handle using a GlobalTag (they could be extended to handle explicit tags as well) so I needed to still get these explicit tags (and their associated Records) from CondDBESSource. In order to avoid having HDF5 and the DB fight each other over which one would deliver the Record (one could es an ESAlias to explicitly set which one wins) I decided to just have the CondHDF5ESSource not deliver those Record at all.
toGet = cms.VPSet(
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('SiPixelGenErrorDBObjectRcd'),
snapshotTime = cms.string('2021-04-17 20:00:00'),
tag = cms.string('SiPixelGenErrorDBObject_phase2_IT_v7.0.2_25x100_v2_mc')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('SiPixelLorentzAngleRcd'),
snapshotTime = cms.string('2021-03-16 20:00:00.000'),
tag = cms.string('SiPixelLorentzAngle_phase2_T25_v0_mc')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
label = cms.untracked.string('forWidth'),
record = cms.string('SiPixelLorentzAngleRcd'),
snapshotTime = cms.string('2020-02-23 14:00:00.000'),
tag = cms.string('SiPixelLorentzAngle_phase2_T19_mc_forWidthEmpty')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
label = cms.untracked.string('fromAlignment'),
record = cms.string('SiPixelLorentzAngleRcd'),
snapshotTime = cms.string('2020-02-23 14:00:00.000'),
tag = cms.string('SiPixelLorentzAngle_phase2_T19_mc_forWidthEmpty')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('SiPixelLorentzAngleSimRcd'),
snapshotTime = cms.string('2021-03-16 20:00:00.000'),
tag = cms.string('SiPixelSimLorentzAngle_phase2_T25_v0_mc')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('SiPixelTemplateDBObjectRcd'),
snapshotTime = cms.string('2021-04-17 20:00:00'),
tag = cms.string('SiPixelTemplateDBObject_phase2_IT_v7.0.2_25x100_v2_mc')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('TrackerAlignmentErrorExtendedRcd'),
snapshotTime = cms.string('2023-03-16 15:30:00'),
tag = cms.string('TrackerAlignmentErrorsExtended_Upgrade2026_T25_design_v0')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('TrackerAlignmentRcd'),
snapshotTime = cms.string('2023-03-16 15:30:00'),
tag = cms.string('TrackerAlignment_Upgrade2026_T25_design_v0')
),
cms.PSet(
connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
record = cms.string('TrackerSurfaceDeformationRcd'),
snapshotTime = cms.string('2023-03-16 15:30:00'),
tag = cms.string('TrackerSurfaceDeformations_Upgrade2026_Zero')
)
)
for the MTD geometry construction part, please have a look at #43124 . The DD4hep version is definitely less problematic as far as MTD is concerned, but not the production default yet. Profiling 24807.911 with IgProf:
5.7 20.92 MTDGeometricTimingDetESModule::produce(IdealGeometryRecord const&) [122]
5.7 20.92 DDCmsMTDConstruction::construct(cms::DDCompactView const&) [123]
so using a pickle file is 25x faster
@Dr15Jones I wasn't able to understand... is the pickled file faster than the original configuration build by cmsDriver.py
, or faster than the fully expanded one from edmConfigDump
?
I wasn't able to understand... is the pickled file faster than the original configuration build by cmsDriver.py, or faster than the fully expanded one from edmConfigDump ?
Faster than the original one from cmsDriver.py.
+geometry
it seems that the main problem in geometry connected with MTD is fixed.
If I use CMSSW_13_3_X_2023-11-08-1100, use a pickled configuration file and use the HDF5 conditions code then for a 8 thread job (which can do some initialization in parallel) it takes 68 seconds to go from starting the job to the first event. This job processed 100 events.
[The image is from the new tracer log viewer application]
Developing reconstruction algorithms for Phase-2 is becoming more and more difficult and frustrating as CMSSW startup times have increased in the last few years. This is what happens when we want to reconstruct 10 events of a single charged pion shot in front of HGCAL with a single thread. The workflow is
24896.0_CloseByPGun_CE_E_Front_120um+2026D98
usingCMSSW_13_2_0_pre3
It takes 4.5 minutes to launch the job, 30 seconds to reconstruct 10 events with a single thread. Reducing the startup time would reduce drastically the number of coffees we have every day while increasing morale.@waredjeb @makortel fyi