cernopendata / data-curation

Data ingestion and curation tools
GNU General Public License v2.0
18 stars 22 forks source link

CMS - provenance for mcdb cases #237

Open katilp opened 4 months ago

katilp commented 4 months ago

Observed when running code/lhe_generators.py with the updates

xz: File too large

It produces some LOG.txt files with a dataset name path (instead of the usual recid) to the lhe_generators/2016-sim/gridpacks/ directory:

$ ls lhe_generators/2016-sim/gridpacks/ | tail -15
75600
75601
BcToBuKPi_BuJPsiK_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen
BcToJPsiMuMu_inclusive_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen
BcToJpsPi_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen
BcToPsi2SPi_PJPP_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen
BcToPsi2SPi_PMM_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen
SPS_D0ToKPi_JPsiPt-100To150_TuneCP5_13TeV-helaconia-pythia8-evtgen
SPS_ToY1SZ_Y1SToMuMu_ZToMuMu_TuneCP5_13TeV-helaconia-pythia8
ST_t-channel_eDecays_anomwtbLVRT_RT4_TuneCP5_13TeV-comphep-pythia8
ST_t-channel_tauDecays_anomwtbLVLT_LT_TuneCP5_13TeV-comphep-pythia8
X0ToUpsilonJPsi_M-12p6_TuneCP5_v2_13TeV-JHUGen-pythia8
X0ToUpsilonJPsi_M-12p7_TuneCP5_v2_13TeV-JHUGen-pythia8
X0ToUpsilonJPsi_M-12p9_TuneCP5_v2_13TeV-JHUGen-pythia8
X0ToUpsilonJPsi_M-13p4_TuneCP5_v2_13TeV-JHUGen-pythia8

with this type of output

$ head lhe_generators/2016-sim/gridpacks/BcToBuKPi_BuJPsiK_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen/RunIISummer20UL16NanoAODv9-106X_mcRun2_as
ymptotic_v17-v1/NANOAODSIM/LOG.txt
2024-06-02 00:14:03 | ERROR | Error xz: (stdout): Write error: File too large
xz: (stdout): Write error: File too large
xz: (stdout): Write error: File too large
xz: (stdout): Write error: File too large

LPAIR generator

"/GGToMuMu_Pt-25_Inel-El_13TeV-lpair/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36304, "mcdb_id": 19544

lhe_generators/2016-sim/mcdb/19544_header.txt has

$ head lhe_generators/2016-sim/mcdb/19544_header.txt
<header>
This file was created from the output of the LPAIR generator
</header>
<header>
This file was created from the output of the LPAIR generator
</header>

Init block with no information on the generator

fpmc

"/GGToGG_bSM_A1A_1e-13_A2A_1e-13_Pt-50_13TeV_fpmc/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36290, "mcdb_id": 19101
"/GGToGG_bSM_A1A_1e-14_A2A_1e-14_Pt-50_13TeV_fpmc/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36292, "mcdb_id": 19102
"/GGToGG_bSM_A1A_5e-13_A2A_0_Pt-50_13TeV_fpmc/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36294, "mcdb_id": 19103
"/GGToGG_SM_Pt-50_13TeV_fpmc/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36296, "mcdb_id": 19104

e.g.

$ cat  lhe_generators/2016-sim/mcdb/19103_header.txt
<init>
     2212     2212   0.65000000E+04   0.65000000E+04    -1    -1    -1    -1     4     1
   0.49029854E-01   0.23246991-306   0.49029854E-04     -1
</init>

lpair

"/GGToMuMu_Pt-25_Inel-Inel_13TeV-lpair/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM": 36306, "mcdb_id": 19545

with multiple init bocks

Full event file

File name <mcdb_id> instead of <mcdb_id>_header.txt and apparently the full event content:

$ ls -lhS lhe_generators/2016-sim/mcdb
total 49G
-rw-r--r--. 1 kati zh  47G Jun  3 19:05 19658
-rw-r--r--. 1 kati zh 1.1G Jun  2 16:05 19405
-rw-r--r--. 1 kati zh 980M Jun  2 16:04 19412

These are for

33273: /BcToPsi2SPi_PMM_TuneCP5_13TeV-bcvegpy2-pythia8-evtgen/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v2/NANOAODSIM
19405: /ST_t-channel_tauDecays_anomwtbLVLT_LT_TuneCP5_13TeV-comphep-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM
19412: /ST_t-channel_eDecays_anomwtbLVRT_RT4_TuneCP5_13TeV-comphep-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM

The first also for several other datasets.