Sorry for the long message, but please read, it's important.
We tried to run our computing workflow on the GRID. The workflow consists in running cmsRun as a heppy preprocessor. The preprocessor is doing:
necessary common operations: re-run tau ID, correct jets, evaluate met filters, etc.
obsolete channel-specific operations: make di-objects, run MVAMET, compute SVFit, skim.
The problem with this workflow are the following:
running heppy+cmsRun takes more than 2GB of RAM, and the jobs get killed on the grid.
the preprocessor has to run for every channel
the preprocessor configuration is getting really complex
So I've implemented a new computing workflow:
create augmented miniaods (MINIAOD_CL, for CERN/Lyon) on the grid. the datasets are published to DBS, and turned into heppy components as usual. For now I store them at T3_FR_IPNL (Lyon), but we could also put some at CERN if @steggema agrees
run heppy, without the preprocessor. Heppy can run anywhere, and reads the MINIAOD_CL with xrootd.
The advantages of this workflow are:
high computing success rate
MINAOD_CL only contains official objects, and are not skimmed. They can be used in all channels, and by any group, not only CERN/LYON. This can save people the hassle of implementing all recipes themselves.
typically, heppy has to run much more often than MINIAOD_CL creation, so it makes sense to perform these operations separately
I confirm perfect synchronisation in the mutau channel. For the tautau and etau channels, we need to slightly adapt the heppy configuration file, please contact me when you want to do so.
To make submission of jobs easier, I provided two scripts, which are executable (see help with -h):
crabSubmit.py : create crab tasks for a selection of samples.
Finally, I added a shebang to compare.py so that it can really be executed from anywhere.
I have started the production of MINIAOD_CL.
the sync dataset BB1000 is ready
DYJets inclusive is running
When this is done and validated, I'll run a prod focusing on the samples needed for an inclusive analysis, and then we'll make the rest.
@lucastorterotot : I will contact you now to help you recover from your corrupted local area
@steggema : we should talk soon, I guess, any time is fine :-)
@GaelTouquet
Sorry for the long message, but please read, it's important.
We tried to run our computing workflow on the GRID. The workflow consists in running cmsRun as a heppy preprocessor. The preprocessor is doing:
The problem with this workflow are the following:
So I've implemented a new computing workflow:
The advantages of this workflow are:
I confirm perfect synchronisation in the mutau channel. For the tautau and etau channels, we need to slightly adapt the heppy configuration file, please contact me when you want to do so.
To make submission of jobs easier, I provided two scripts, which are executable (see help with -h):
Finally, I added a shebang to
compare.py
so that it can really be executed from anywhere.I have started the production of MINIAOD_CL.
When this is done and validated, I'll run a prod focusing on the samples needed for an inclusive analysis, and then we'll make the rest.
@lucastorterotot : I will contact you now to help you recover from your corrupted local area @steggema : we should talk soon, I guess, any time is fine :-) @GaelTouquet