Closed tvami closed 1 year ago
A new Issue was created by @tvami Tamas Vami.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign xpog
New categories assigned: xpog
@mariadalfonso,@gouskos,@swertz,@vlimant you have been requested to review this Pull request/Issue and eventually sign? Thanks
I also checked CMSSW_12_4_8
and CMSSW_12_5_X_2022-08-25-1100
, the alignment records are still consumed
out of curiosity I tried to run step2 of wf. 136.8523 by using an overridden tracker alignment record with a tag that doesn't have conditions data on the run which is used for the relval, prepared via:
$ conddb_import -c sqlite_file:myAlignments.db -f frontier://FrontierProd/CMS_CONDITIONS -i TrackerAlignment_v29_offline -t Alignments -b 345747
$ conddb_import -c sqlite_file:myAlignments.db -f frontier://FrontierProd/CMS_CONDITIONS -i TrackerAlignmentExtendedErrors_v16_offline_IOVs -t AlignmentErrors -b 345747
and then patching the configuration with:
# Other statements
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:run2_data', '')
process.GlobalTag.toGet = cms.VPSet(
cms.PSet(record = cms.string("TrackerAlignmentRcd"),
tag = cms.string("Alignments"),
connect = cms.string("sqlite_file:myAlignments.db")
),
cms.PSet(record = cms.string("TrackerAlignmentErrorExtendedRcd"),
tag = cms.string("AlignmentErrors"),
connect = cms.string("sqlite_file:myAlignments.db")
)
)
in order to see what module would start the crash and I see:
----- Begin Fatal Exception 26-Aug-2022 10:36:07 CEST-----------------------
An exception of category 'NoRecord' occurred while
[0] Processing Event run: 319450 lumi: 31 event: 42789123 stream: 0
[1] Running path 'dqmoffline_step'
[2] Prefetching for module NanoAODDQM/'nanoDQM'
[3] Prefetching for module SimpleCandidateFlatTableProducer/'fatJetTable'
[4] Prefetching for module PATJetRefSelector/'finalJetsAK8'
[5] Prefetching for module PATJetUserDataEmbedder/'updatedJetsAK8WithUserData'
[6] Prefetching for module PATJetUpdater/'updatedJetsAK8'
[7] Prefetching for module PATJetSelector/'selectedUpdatedPatJetsAK8WithDeepInfo'
[8] Prefetching for module PATJetUpdater/'updatedPatJetsTransientCorrectedAK8WithDeepInfo'
[9] Prefetching for module BoostedJetONNXJetTagsProducer/'pfParticleNetMassRegressionJetTagsAK8WithDeepInfo'
[10] Prefetching for module DeepBoostedJetTagInfoProducer/'pfParticleNetTagInfosAK8WithDeepInfo'
[11] Prefetching for EventSetup module TransientTrackBuilderESProducer/''
[12] Prefetching for EventSetup module GlobalTrackingGeometryESProducer/''
[13] Calling method for EventSetup module TrackerDigiGeometryESModule/'trackerGeometryDB'
[14] While getting dependent Record from Record TrackerDigiGeometryRecord
Exception Message:
No "TrackerAlignmentRcd" record found in the EventSetup.
The Record is delivered by an ESSource or ESProducer but there is no valid IOV for the synchronization value.
Please check
a) if the synchronization value is reasonable and report to the hypernews if it is not.
b) else check that all ESSources have been properly configured.
----- End Fatal Exception -------------------------------------------------
so it seems that the alignment are used because of the GlobalTrackingGeometry
which is in turn used for running some BTV taggers.
Another case of alignment dependency seems in the PPS
With this commit in 10_6 @nsmith was able to bypass it i.e. commenting nanoSequenceOnlyData = cms.Sequence(protonTables + lhcInfoTable) https://github.com/cms-sw/cmssw/pull/39040/commits/cd90ed159e5829e0404841f478ea59615f88840f
Another case of alignment dependency seems in the PPS
judging from the table above doesn't seem any PPS condition is used (at least for wf 136.8523)
Another case of alignment dependency seems in the PPS
judging from the table above doesn't seem any PPS condition is used (at least for wf 136.8523)
The geometry is called here. https://github.com/cms-sw/cmssw/pull/32616/files#diff-67a90255f126a36efd13f3396aa5aef7edb0ab7bd4744ece6c300e4f0ea7113bR158
The geometry is called here.
geometry is not alignment (which is the topic of the issue...). Quoting
From this the JEC + JER + Geometry + RunInfo are expected, however the Muon + tracker + global alignment records are not.
so it seems that the alignment are used because of the
GlobalTrackingGeometry
which is in turn used for running some BTV taggers.136.8523
ok, this is Run2 and we explicitly re-run the tagger since wasn't in UL-mini. But for Run3 nano the re-running should be off and dependency from BoostedJetONNXJetTagsProducer/'pfParticleNetMassRegressionJetTagsAK8WithDeepInfo' should be gone https://github.com/cms-sw/cmssw/blob/b54d06d5447f6bb55d1da6c42d3a413f526150a0/PhysicsTools/NanoAOD/python/jetsAK8_cff.py#L261
@mariadalfonso
ok, this is Run2 and we explicitly re-run the tagger since wasn't in UL-mini.
Right but you are asking us to provide you GTs for Run-2 data as well
But for Run3 nano the re-running should be off and dependency from
Ok that sounds reassuring, can you please post the cmsDriver command to reproduce the Run-3 wf? is there a runTheMatrix wf too for that? I'll check that too. But even if that's the true, it's just half the solution to the problem.
@mariadalfonso
ok, this is Run2 and we explicitly re-run the tagger since wasn't in UL-mini.
Right but you are asking us to provide you GTs for Run-2 data as well
Tracking and pf-candidate is already done in mini, so I do not expect re-running the particlenet for the b-tagging should really use the alignment; I asked the expert to see if something can be cleanup @hqucms
Tracking and pf-candidate is already done in mini, so I do not expect re-running the particlenet for the b-tagging should really use the alignment;
a consumes statement to TransientTrackRecord
is declared here,
it should be consumed here:
Isnt that https://github.com/cms-sw/cmssw/blob/master/RecoBTag/FeatureTools/plugins/DeepDoubleXTagInfoProducer.cc#L118 and then https://github.com/cms-sw/cmssw/blob/master/RecoBTag/FeatureTools/plugins/DeepDoubleXTagInfoProducer.cc#L224
Or this DeepDoubleXTagInfo
is not relevant for Nano?
Or this DeepDoubleXTagInfo is not relevant for Nano?
it's not in the stack trace I posted above...
Or this DeepDoubleXTagInfo is not relevant for Nano?
it's not in the stack trace I posted above...
Right, in that stack trace I also didnt find anything, I found this other thing instead, and now I'm asking if that's relevant or not.
Right, in that stack trace I also didnt find anything,
I think I pointed already to the exact point in which there is the consumes statement :)
Right, in that stack trace I also didnt find anything,
I think I pointed already to the exact point in which there is the consumes statement :)
this pointed to use of TransientTrackBuilder
. This is used in IPTools, the primary tool for many btagging algos.
Poking around a bit, it seems like the TrackingGeometry
is used mainly for the track inner/outer state Surface building.
Poking around a bit, it seems like the
TrackingGeometry
is used mainly for the track inner/outer state Surface building.
The TSOS(state on surface) built from TrackExtra may not be meaningful though if the surface derived from the DetId and new geometry is different from the free state. 🤔
But for Run3 nano the re-running should be off and dependency from Ok that sounds reassuring, can you please post the cmsDriver command to reproduce the Run-3 wf?
So I ran in CMSSW_12_5_X_2022-08-25-1100
the following:
cmsDriver.py RECO --conditions 124X_dataRun3_Prompt_v4 --datatier NANOAOD --era Run3 --eventcontent NANOEDMAOD --filein /store/data/Run2022C/DoubleMuon/MINIAOD/PromptReco-v1/000/355/862/00000/5e7b3b04-fb7b-432a-9ca5-f249163a8ced.root --fileout file:nano.root -n 10 --python_filename ReReco_Nano_0_cfg.py --scenario pp --step NANO --data --customise_commands='process.GlobalTag.DumpStat=cms.untracked.bool(True)'
Indeed for Run-3 in master, we dont have the alignment dependency, so this issue is about Run-2 only (which is anyway the era that triggered the discussion).
But for Run3 nano the re-running should be off and dependency from Ok that sounds reassuring, can you please post the cmsDriver command to reproduce the Run-3 wf?
So I ran in
CMSSW_12_5_X_2022-08-25-1100
the following:cmsDriver.py RECO --conditions 124X_dataRun3_Prompt_v4 --datatier NANOAOD --era Run3 --eventcontent NANOEDMAOD --filein /store/data/Run2022C/DoubleMuon/MINIAOD/PromptReco-v1/000/355/862/00000/5e7b3b04-fb7b-432a-9ca5-f249163a8ced.root --fileout file:nano.root -n 10 --python_filename ReReco_Nano_0_cfg.py --scenario pp --step NANO --data --customise_commands='process.GlobalTag.DumpStat=cms.untracked.bool(True)'
Indeed for Run-3 in master, we dont have the alignment dependency, so this issue is about Run-2 only (which is anyway the era that triggered the discussion).
for completeness: can you list the Record Label used here ?
But for Run3 nano the re-running should be off and dependency from Ok that sounds reassuring, can you please post the cmsDriver command to reproduce the Run-3 wf?
So I ran in
CMSSW_12_5_X_2022-08-25-1100
the following:cmsDriver.py RECO --conditions 124X_dataRun3_Prompt_v4 --datatier NANOAOD --era Run3 --eventcontent NANOEDMAOD --filein /store/data/Run2022C/DoubleMuon/MINIAOD/PromptReco-v1/000/355/862/00000/5e7b3b04-fb7b-432a-9ca5-f249163a8ced.root --fileout file:nano.root -n 10 --python_filename ReReco_Nano_0_cfg.py --scenario pp --step NANO --data --customise_commands='process.GlobalTag.DumpStat=cms.untracked.bool(True)'
Indeed for Run-3 in master, we dont have the alignment dependency, so this issue is about Run-2 only (which is anyway the era that triggered the discussion).
for completeness: can you list the Record Label used here ?
Record | Label | Tag |
---|---|---|
GBRDWrapperRcd | electron_eb_ECALTRK | GEDelectron_track_EBCorrection_80X_EGM_v4 |
GBRDWrapperRcd | electron_eb_ECALTRK_lowpt | GEDelectron_track_lowpt_EBCorrection_80X_EGM_v4 |
GBRDWrapperRcd | electron_eb_ECALTRK_lowpt_var | GEDelectron_track_lowpt_EBUncertainty_80X_EGM_v4 |
GBRDWrapperRcd | electron_eb_ECALTRK_var | GEDelectron_track_EBUncertainty_80X_EGM_v4 |
GBRDWrapperRcd | electron_ee_ECALTRK | GEDelectron_track_EECorrection_80X_EGM_v4 |
GBRDWrapperRcd | electron_ee_ECALTRK_lowpt | GEDelectron_track_lowpt_EECorrection_80X_EGM_v4 |
GBRDWrapperRcd | electron_ee_ECALTRK_lowpt_var | GEDelectron_track_lowpt_EEUncertainty_80X_EGM_v4 |
GBRDWrapperRcd | electron_ee_ECALTRK_var | GEDelectron_track_EEUncertainty_80X_EGM_v4 |
JetCorrectionsRecord | AK4PFPuppi | JetCorrectorParametersCollection_Summer16_23Sep2016AllV4_DATA_AK4PFPuppi |
JetCorrectionsRecord | AK8PFPuppi | JetCorrectorParametersCollection_Summer16_23Sep2016AllV4_DATA_AK8PFPuppi |
L1GtTriggerMaskAlgoTrigRcd | L1GtTriggerMaskAlgoTrig_CRAFT09v2_hlt | |
L1GtTriggerMenuRcd | L1GtTriggerMenu_CRAFT09_hlt | |
L1TUtmTriggerMenuRcd | L1TUtmTriggerMenu_Stage2v0_hlt | |
LHCInfoRcd | LHCInfoEndFill_Run3_v0 |
@tvami I am a bit confused looking at the Rcd report in a recent nano workflows
runTheMatrix.py --what nano --site "" --command '--number 5 --customise_commands "process.GlobalTag.DumpStat=cms.untracked.bool(True)"' -l 2500.601,2500.332 --job-report
it seems to pull in a very very large number of records. I have done something wrong ?
hi @vlimant sorry, I was on vacation until now. It does give a huge list all the time, but then when the record is accessed it also shows which IOV is accessed. It's true that it's not very user friendly to read the output.
ok, thanks for confirming. having filtered the record with no payload I somehow still have a larger set than in your latest comment. Is there a simple way to trace what module loaded a specific record ?
latest I got is under https://cernbox.cern.ch/s/JqTmbbfQSjdtyDc
interesting, so I was looking at 136.8523
which is for data... so it seems the MC has more dependencies.
Is there a simple way to trace what module loaded a specific record ?
No, not that I know of
Is there a simple way to trace what module loaded a specific record ?
If you run a single threaded job and add in
process.add_(cms.Service("Tracer", dumpEventSetupInfo = cms.untracked.bool(True))
you will see a module do a prefetch request and then see which EventSetup modules are triggered.
please close
we can reopen if ever this becomes a problem again ; point at which we will need a valid tool to investigate the dependency
The following table was produced to see what records are consumed in which step: https://twiki.cern.ch/twiki/bin/view/CMS/AlCaDBHLT2019#Table_of_conditions_for_2018_MC
As you can see the following records are consumed in the Nano step
From this the JEC + JER + Geometry + RunInfo are expected, however the Muon + tracker + global alignment records are not.
The results can be reproduced by
(inspired by 136.8523)