Open katilp opened 1 year ago
Start with example datasets:
/ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM /BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1/MINIAODSIM
Expected changes in the scripts:
Called from interface.py
:
create-das-json-store
does not need to query configs and parent, we get them all from mcm, in practice:
prep_id
which can be used for the McM querycreate-mcm-store
will not need to proceed through input/output datasets, now we can work entirely with prep_id
s, in practice:
pred_id
s of the datasets in the provenance chain)get-conf-files
will then get the config file list only from McM, some changes in the code may be neededcreat-records
will have similar updates as were done in the collision record scriptslhe_generators.py is called separately (see e.g. 2015 readme):
/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.2.2/SingleBp/Bpb_M900GeV_W9GeV_Zb_LH_tarball.tar.xz
(in an example LHE config file) but the /cvmfs area has them separately:
-bash-4.2$ ls /cvmfs/cms.cern.ch/phys_generator/gridpacks/
13p6TeV gridpacks pre2017 slc6_amd64_gcc472 slc6_amd64_gcc530 slc7_amd64_gcc630 UL
14TeV lhe_merger RunII slc6_amd64_gcc481 slc6_amd64_gcc630 slc7_amd64_gcc700 untar
2017 mg_amg_patch RunIII slc6_amd64_gcc491 slc6_amd64_gcc700 slc7_amd64_gcc820
2018 PdmV RunUL slc6_amd64_gcc493 slc7_amd64_gcc10 slc7_amd64_gcc900
-bash-4.2$ ls /cvmfs/cms.cern.ch/phys_generator/gridpacks/pre2017
13TeV 14TeV
[x] Remove the parent loop in das_json_store.py
(step 2) and get the information for the "top" dataset only
[x] Remove the parent query from mcm_store.py
(step 3) and loop over the step in the production chain
[x] Restructure the output so that in a new directory chain
, the top dataset has subdirs for each step with respective dict
and scripts
subdirs
$ tree inputs/mcm-store/
inputs/mcm-store/
├── chain
│ ├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM
│ │ ├── EXO-RunIISummer20UL16DIGIPremix-00521
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ ├── EXO-RunIISummer20UL16GEN-00123
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ ├── EXO-RunIISummer20UL16HLT-00521
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ ├── EXO-RunIISummer20UL16MiniAODv2-00213
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ ├── EXO-RunIISummer20UL16NanoAODv9-00205
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ ├── EXO-RunIISummer20UL16RECO-00521
│ │ │ ├── dict
│ │ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ │ └── scripts
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ │ └── EXO-RunIISummer20UL16SIM-00521
│ │ ├── dict
│ │ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ │ └── scripts
│ │ └── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
│ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM
│ ├── SMP-RunIISummer20UL16DIGIPremix-00053
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ ├── SMP-RunIISummer20UL16HLT-00056
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ ├── SMP-RunIISummer20UL16MiniAODv2-00038
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ ├── SMP-RunIISummer20UL16NanoAODv9-00038
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ ├── SMP-RunIISummer20UL16RECO-00056
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ ├── SMP-RunIISummer20UL16SIM-00056
│ │ ├── dict
│ │ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ │ └── scripts
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
│ └── SMP-RunIISummer20UL16wmLHEGEN-00237
│ ├── dict
│ │ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
│ └── scripts
│ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
├── dict
│ ├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.json
│ └── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.json
└── scripts
├── @ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8@RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2@NANOAODSIM.sh
└── @BBH_HToJPsiG_JPsiToMuMu_TuneCP5_13TeV-madgraph-pythia8@RunIISummer20UL16MiniAODv2-106X_mcRun2_asymptotic_v17-v1@MINIAODSIM.sh
[x] Query only McM in conf_store.py
(step 4) and use the chain/dataset/step
subdir as an input to the functions
[x] Update the provenance query in dataset_records.py
(step 5) similarly, taking into account the changes above
[x] Clean away some remaining das back-up queries that would not work as there are no dataset for all provenance steps in das
[x] Update the LHE part
For the record, DIGIPremix step has a 22 Mb config file containing the list of files in the pile-up Premix datasets. For the two test datasets that I use they differ only in naming:
$ ls -l inputs/config-store/
total 44823
-rw-r--r--. 1 kati zh 5917 Oct 19 14:29 086c69c1b826c78c43be2aa70d80e01e.configFile
-rw-r--r--. 1 kati zh 8671 Oct 19 14:29 160526781ab6242177672ffc68eb5568.configFile
-rw-r--r--. 1 kati zh 4319 Oct 19 14:29 481ced9502ea985a73dc7bca8c9ea7a9.configFile
-rw-r--r--. 1 kati zh 4349 Oct 19 14:29 528bf7046404f48fa330df88a6a92123.configFile
-rw-r--r--. 1 kati zh 22907660 Oct 19 14:29 528bf7046404f48fa330df88a6a9594b.configFile
-rw-r--r--. 1 kati zh 4521 Oct 19 14:29 528bf7046404f48fa330df88a6a99098.configFile
-rw-r--r--. 1 kati zh 4850 Oct 19 14:29 528bf7046404f48fa330df88a6a9a53b.configFile
-rw-r--r--. 1 kati zh 9324 Oct 19 14:29 70368b76504c9adbeb8bd6f29a1b6dee.configFile
-rw-r--r--. 1 kati zh 11520 Oct 19 14:29 80266517fa91333a47ed2d1cc3eeddf0.configFile
-rw-r--r--. 1 kati zh 12957 Oct 19 14:29 c8dc83abb237e289eae3cfefea871409.configFile
-rw-r--r--. 1 kati zh 4349 Oct 19 14:29 edf4aef02c2af29980365f11a8f78f77.configFile
-rw-r--r--. 1 kati zh 22907660 Oct 19 14:29 edf4aef02c2af29980365f11a8faa478.configFile
-rw-r--r--. 1 kati zh 4521 Oct 19 14:29 edf4aef02c2af29980365f11a8fade0c.configFile
-rw-r--r--. 1 kati zh 4850 Oct 19 14:29 edf4aef02c2af29980365f11a8fbd0b0.configFile
with
-bash-4.2$ diff inputs/config-store/528bf7046404f48fa330df88a6a9594b.configFile inputs/config-store/edf4aef02c2af29980365f11a8faa478.configFile
5c5
< # with command line options: --python_filename TOP-RunIISummer20UL16DIGIPremix-00281_1_cfg.py --eventcontent PREMIXRAW --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN-SIM-DIGI --fileout file:TOP-RunIISummer20UL16DIGIPremix-00281.root --pileup_input dbs:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX --conditions 106X_mcRun2_asymptotic_v13 --step DIGI,DATAMIX,L1,DIGI2RAW --procModifiers premix_stage2 --nThreads 4 --geometry DB:Extended --filein file:TOP-RunIISummer20UL16SIM-00281.root --datamix PreMix --era Run2_2016 --runUnscheduled --no_exec --mc -n 5807
---
> # with command line options: --python_filename TOP-RunIISummer20UL16DIGIPremix-00291_1_cfg.py --eventcontent PREMIXRAW --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN-SIM-DIGI --fileout file:TOP-RunIISummer20UL16DIGIPremix-00291.root --pileup_input dbs:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX --conditions 106X_mcRun2_asymptotic_v13 --step DIGI,DATAMIX,L1,DIGI2RAW --procModifiers premix_stage2 --nThreads 4 --geometry DB:Extended --filein file:TOP-RunIISummer20UL16SIM-00291.root --datamix PreMix --era Run2_2016 --runUnscheduled --no_exec --mc -n 5081
29c29
< input = cms.untracked.int32(5807)
---
> input = cms.untracked.int32(5081)
35c35
< fileNames = cms.untracked.vstring('file:TOP-RunIISummer20UL16SIM-00281.root'),
---
> fileNames = cms.untracked.vstring('file:TOP-RunIISummer20UL16SIM-00291.root'),
64c64
< annotation = cms.untracked.string('--python_filename nevts:5807'),
---
> annotation = cms.untracked.string('--python_filename nevts:5081'),
76c76
< fileName = cms.untracked.string('file:TOP-RunIISummer20UL16DIGIPremix-00281.root'),
---
> fileName = cms.untracked.string('file:TOP-RunIISummer20UL16DIGIPremix-00291.root'),
This is a 22M file and if taken for 40k MC datasets, it will result in 880 G disk space, so we can do it differently...
To do:
mcm_store/chain
done only for nanoget_all_generator_text
https://cms-pdmv.cern.ch/mcm/
to https://cms-pdmv-prod.web.cern.ch/mcm
lhe_generators.py
with the rest, i.e. integrate to interface.py
utils.py
RUNNUMBER_CACHE = { }
etcpdmv_submission_date
Updates to LHE generator search
runcmsgrid.sh
scriptCheck which inputs are passed to the job in runcmsgrid.sh
cd $LHEWORKDIR/process
[...]
./run.sh $submitting_event $run_random_seed
process/run.sh:
DIR='./madevent'
[... or else...]
[...]
${DIR}/bin/gridrun $num_events $seed $gran
etc... gets complex
Check what happened in gridpack_generation.log
and check with GEN conveners.
cat powheg.input
../pwhg_main &> log_${process}_${seed}.txt; test $? -eq 0 || fail_exit "pwhg_main error: exit code not 0"
Input: powheg.input
[x] jhugen - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/jhugen/ Run command:
cd JHUGenerator/
./JHUGen $(cat ../JHUGen.input) VegasNc2=${nevt} Seed=${rnum} DataFile=undecayed &&
./JHUGen $(cat ../JHUGen_decay.input) Seed=${rnum} ReadLHE=undecayed.lhe Seed=${rnum} DataFile=Out
Inputs: /JHUGen.input, JHUGen_decay.input
[x] jhugen - case: /cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/slc7_amd64_gcc820/JHUGen Run command:
cd JHUGenerator/
./JHUGen $(cat ../JHUGen.input) VegasNc2=${nevt} Seed=${rnum} DataFile=undecayed &&
Input: JHUGen.input
cat r_GEN.in | sed -e s/EVENTSNUM/${nevt}/ > r_tempo.in
cat r_tempo.in | sed -e s/RANDOMSEED/${rnum}/ > r.in
rm r_tempo.in
./phantom_1_3_p1_slc6_amd64_gcc630/phantom.exe >& log_GEN.txt
Input: r_GEN.in
../Bin/mcfm readInput.DAT |& tee log
Input: ./readInput.DAT
Reminder:
tar -tvf <gridpack name>.tgz
: lists the contents of the archive
tar -xf <gridpack name>.tgz
: extracts all files
tar -xf <gridpack name>.tgz <file name>
: extracts one file only
Note:
./
*.dat
or *.input
(powheg, jhugen)
The current script gets the provenance information as follows
As the processing scheme has changed from UL processing (no input datasets before AODSIM as they were transient) this won't work anymore.
The query flow should be changed to go directly to the chain:
For an example dataset /ADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM:
On the web GUI:
Query by the output file name:
https://cms-pdmv.cern.ch/mcm/requests?produce=%2FADDmonoPhoton_MD-1_d-3_TuneCP5_13TeV-pythia8%2FRunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2%2FNANOAODSIM&page=0&shown=140737488355327
https://cms-pdmv.cern.ch/mcm/chained_requests?contains=EXO-RunIISummer20UL16NanoAODv9-00205&page=0
then for each request of the query and get the dicts in the respective pages.
On the command line
Using
pred_id
from dasget the chain from the dictionary:
curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIISummer20UL16NanoAODv9-00205
get the id of the chained request
auth-get-sso-cookie -u https://cms-pdmv.cern.ch/mcm -o cookies.txt
(see also docs)then, for each step in the chain, get the full dict or what is needed, e.g.
An example with the LHE step: