cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.06k stars 4.26k forks source link

FastSim in Run-3, driver to follow up #37351

Closed srimanob closed 1 year ago

srimanob commented 2 years ago

Following the first attempt to make FastSim Run-3 workflow with https://github.com/cms-sw/cmssw/pull/37347

There are three issues that we should follow up.

  1. Sequence for Validation, DQM and Harvesting (*).
  2. Sequence for Nano (**)
  3. If we should include GEM or not (i.e. use Run3 modifier as FullSim, no need to exclude it as Run-2)

(*) cmsDriver: cmsDriver.py TTbar_14TeV_TuneCP5_cfi --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQMFS+@miniAODDQM -n 10 --conditions auto:phase1_2021_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier GEN-SIM-DIGI-RECO,MINIAODSIM,DQMIO --eventcontent FEVTDEBUGHLT,MINIAODSIM,DQM --fast --era Run3_FastSim --io TTbarFS_14_UP21.io --python TTbarFS_14_UP21.py --no_exec --fileout file:step1.root --nThreads 8

The issue:

GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQMFS+@miniAODDQM
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
Step: GEN Spec: 
Loading generator fragment from Configuration.Generator.TTbar_14TeV_TuneCP5_cfi
Step: SIM Spec: 
Step: RECOBEFMIX Spec: 
Step: DIGI Spec: ['pdigi_valid']
Step: L1 Spec: 
Step: DIGI2RAW Spec: 
Step: L1Reco Spec: 
Step: RECO Spec: 
Step: PAT Spec: 
Step: VALIDATION Spec: ['@standardValidation', '@miniAODValidation']
@standardValidation+@miniAODValidation in preparing validation
Step: DQM Spec: ['@standardDQMFS', '@miniAODDQM']
Traceback (most recent call last):
  File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre1/bin/slc7_amd64_gcc10/cmsDriver.py", line 56, in <module>
    run()
  File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre1/bin/slc7_amd64_gcc10/cmsDriver.py", line 28, in run
    configBuilder.prepare()
  File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre1/python/Configuration/Applications/ConfigBuilder.py", line 2162, in prepare
    self.addStandardSequences()
  File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre1/python/Configuration/Applications/ConfigBuilder.py", line 790, in addStandardSequences
    getattr(self,"prepare_"+stepName)(sequence = '+'.join(stepSpec))
  File "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_0_pre1/python/Configuration/Applications/ConfigBuilder.py", line 1993, in prepare_DQM
    setattr(self.process,pathName, cms.EndPath( getattr(self.process,sequence ) ) )
AttributeError: 'Process' object has no attribute 'DQMOfflineMiniAOD'

(**) cmsDriver: cmsDriver.py TTbar_14TeV_TuneCP5_cfi --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT,NANO,VALIDATION:@standardValidation,DQM:@standardDQMFS -n 10 --conditions auto:phase1_2021_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier GEN-SIM-DIGI-RECO,MINIAODSIM,NANOAODSIM,DQMIO --eventcontent FEVTDEBUGHLT,MINIAODSIM,NANOEDMAODSIM,DQM --fast --era Run3_FastSim --io TTbarFS_14_UP21.io --python TTbarFS_14_UP21.py --no_exec --fileout file:step1.root --nThreads 8

The issue:

----- Begin Fatal Exception 25-Mar-2022 20:27:38 CET-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  stream begin Run run: 1 stream: 2
   [1] Calling method for module GenParticles2HepMCConverter/'genParticles2HepMCHiggsVtx'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: GenRunInfoProduct
Looking for module label: generator
Looking for productInstanceName: 

----- End Fatal Exception -------------------------------------------------
cmsbuild commented 2 years ago

A new Issue was created by @srimanob Phat Srimanobhas.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 2 years ago

assign fastsim, dqm, xpog

cmsbuild commented 2 years ago

New categories assigned: fastsim,dqm,xpog

@jfernan2,@sbein,@ahmad3213,@ssekmen,@mariadalfonso,@mdhildreth,@rvenditti,@lveldere,@gouskos,@emanueleusai,@fgolf,@pbo0,@civanch,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks

srimanob commented 2 years ago

@cms-sw/xpog-l2 Do you have a comment on Nano step?

Currently, the nano step works if it runs separately, i.e. take Mini as output from FastSim. But it will not work if it runs together with FastSim+Mini step.

=== one step === cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT,NANO -n 50 --conditions auto:phase1_2022_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier MINIAODSIM,NANOAODSIM --eventcontent MINIAODSIM,NANOEDMAODSIM --geometry DB:Extended --era Run3_FastSim --fast --python TTbar_14TeV_TuneCP5_2021_GenSim_Run3FSNano.py --no_exec --fileout file:step1_RUn3FSNano.root

Error:

----- Begin Fatal Exception 13-Jul-2022 19:38:41 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  stream begin Run run: 1 stream: 0
   [1] Calling method for module GenParticles2HepMCConverter/'genParticles2HepMCHiggsVtx'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: GenRunInfoProduct
Looking for module label: generator
Looking for productInstanceName: 

----- End Fatal Exception -------------------------------------------------

=== two steps === This way, it works. cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT -n 50 --conditions auto:phase1_2022_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier MINIAODSIM --eventcontent MINIAODSIM --geometry DB:Extended --era Run3_FastSim --fast --python TTbar_14TeV_TuneCP5_2021_GenSim_Run3FS.py --no_exec --fileout file:step1_RUn3FS.root

cmsDriver.py step2 -s NANO -n 50 --conditions auto:phase1_2022_realistic --datatier NANOAODSIM --eventcontent NANOEDMAODSIM --geometry DB:Extended --era Run3_FastSim --fast --python NanoOnly.py --no_exec --filein file:step1_RUn3FS.root --fileout file:step2_Nano.root

sbein commented 2 years ago

Hi @srimanob, when running === one step ===, it looks like the crash stems from changes made to the RivetInterface code where there is an ambiguity in the data type associated with the label genEventInfo, which is requested in the config file dump like:

process.genParticles2HepMCHiggsVtx = cms.EDProducer("GenParticles2HepMCConverter",
    genEventInfo = cms.InputTag("generator"),
    genParticles = cms.InputTag("mergedGenParticles"),
    signalParticlePdgIds = cms.vint32(25)
)

In this producer GenParticles2HepMCConverter.cc, there now seem to be two requests for "genEventInfo": https://github.com/cms-sw/cmssw/blob/CMSSW_12_5_X/GeneratorInterface/RivetInterface/plugins/GenParticles2HepMCConverter.cc#L60-L61

genEventInfoToken_ = consumes<GenEventInfoProduct>(pset.getParameter<edm::InputTag>("genEventInfo"));
genRunInfoToken_ = consumes<GenRunInfoProduct, edm::InRun>(pset.getParameter<edm::InputTag>("genEventInfo"));

This was not the case before C11_0_X. It seems that the product process.generator as seen in the same config dump mentioned is definitely of type GenEventInfoProduct and not GenRunInfoProduct, so it makes sense that it would crash when it can't find the right name/type combination. The piece I really don't understand yet is why it works when you split it into === two steps ===.

makortel commented 2 years ago

The problem is likely that GenParticles2HepMCConverter reads the GenRunInfoProduct in beginRun() https://github.com/cms-sw/cmssw/blob/ac7a17a415327c259e1e8d0e67659ce27bf8d197/GeneratorInterface/RivetInterface/plugins/GenParticles2HepMCConverter.cc#L68-L70 whereas quick git grep "produces<GenRunInfoProduct" shows that all producers of GenRunInfoProduct produce it in endRun().

In such a case when the producer and the consumer are in the same job, the consumer in beginRun() sees that the product does not exist (because it will be produced later in endRun()), but when the producer and the consumer are in different jobs, the product produced in first job's endRun() is in the file and is thus accessible in the beginRun() of the second job.

sbein commented 2 years ago

Thanks @makortel. Would you see it as non-invasive to put the guts of one of the endRunProduces [1] into a beginRun function? Perhaps even the GenParticles2HepMCConverter.cc? Any other obvious solution that would allow combining the steps?

[1]

void BeamHaloProducer::endRunProduce(Run& run, const EventSetup& es) {
  // just create an empty product
  // to keep the EventContent definitions happy
  // later on we might put the info into the run info that this is a PGun
  unique_ptr<GenRunInfoProduct> genRunInfo(new GenRunInfoProduct());
  run.put(std::move(genRunInfo));
}

Alternatively, for the validation, this Rivet cross section stuff may not be needed - maybe we can just check to see if the GenRunInfoProduct exists and only use it if it does?

makortel commented 2 years ago

Based on (the little) I understood from generators and the GenParticles2HepMCConverter they, unfortunately, look like they fundamentally can not be run in the same job (generators have the information available only at endRun, and GenParticles2HepMCConverter uses the information during its produce()).

Maybe @cms-sw/generators-l2 can provide further insight.

srimanob commented 2 years ago

Ah, I think I got it. Since the FullSim, generator runs separately. So there is no issue to run Nano+validation together with RECO. Thanks @sbein @makortel

Maybe the way to make this possible is to run GEN first, then FastSim-Nano later if we would like to run Nano validation sequence.

srimanob commented 2 years ago

OK, after splitting and small modification of DQM, it now works. Maybe we can discuss today at the SIM meeting.

cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN -n 50 --conditions auto:phase1_2022_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier GEN --eventcontent FEVTDEBUG --geometry DB:Extended --era Run3_FastSim --fast --python TTbar_14TeV_TuneCP5_2021_Gen.py --no_exec --fileout file:step1_GEN.root

cmsDriver.py step2 -s SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,PAT,NANO,VALIDATION:@standardValidation,DQM:@standardDQMFS+@miniAODDQM+@nanoAODDQM -n 50 --conditions auto:phase1_2022_realistic --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier MINIAODSIM,NANOAODSIM,DQMIO --eventcontent MINIAODSIM,NANOEDMAODSIM,DQM --geometry DB:Extended --era Run3_FastSim --fast --python TTbar_14TeV_TuneCP5_2021_Run3FSNano.py --no_exec --fileout file:step2.root --filein file:step1_GEN.root

cmsDriver.py step3 -s HARVESTING:validationHarvesting+@miniAODDQM+@nanoAODDQM --conditions auto:phase1_2022_realistic --mc --geometry DB:Extended --scenario pp --filetype DQM --era Run3_FastSim --fast --filein file:step2_inDQM.root --python HARVESTNano_Run3FS_2021.py -n -1 --no_exec --fileout file:step3.root

srimanob commented 2 years ago

Note that, I put this discussion (in summary) and result of Nano+NanoDQM in the SIM meeting today, https://indico.cern.ch/event/1182398/ (Workflow talk)

srimanob commented 1 year ago

@sbein Should this issue be closed? Or something remain to follow up for Run-3 FastSim.

sbein commented 1 year ago

@srimanob It can be closed, thanks for following up.

vlimant commented 1 year ago

please close