Closed ericvaandering closed 5 years ago
ewv: Notes from a short discussion at FNAL with Chris Jones, Sal Rappaccio, Sudhir Malik:
Chris, Sal, Sudhir, and I talked about a few topics. These are my notes. Please correct any mistakes.
1) Being able to supply command line arguments to the cmssw_cfg.py config file is extremely useful. CRAB3 does not currently support this. It has to be added.
2) Almost no one uses FWLite in any of its incarnations, but enough people are running PyRoot based analyses on batch systems that it's likely that CRAB3 being able to run this kind of workflow would be useful.
3) If we enable #2, then PyFWLite is just another flavor of that and can be done easily.
4) VarParsing as a way to configure cmssw_cfg.py or PyFWLite/PyRoot jobs should die. The python argparse module can do everything VarParsing does. 4a) argparse can be added to python 2.6 so we should do this for the current CMSSW 4b) argparse is part of python 2.7; CMSSW 5 should move to python 2.7
5) In the event that we do #2, the only way pass command line args from CRAB to PyRoot/PyFWLite will be with a PSet based configuration (with a few standard parameters). The PSet will be responsible for translating command line parameters to PSet settings, presumably through argparse
6) Configuration in CRAB3 is through python. All commands like submit/status will return python data structures. It will be interesting to know how useful some sort of multicrab-like interface for configuring and interacting with CRAB will be.
7) We should survey users on #2, #5, and #6 at some point. Sudhir and/or Sal should interact with Ian Fisk to see how that survey might fit into existing user surveys and the timing w.r.t. the soon-to-start power user tests of CRAB3.
Eric
meloam: I want this ticket (I need it for my analysis). The CMSSW step can first be refactored to pull out the common code (which is about 99% of it), FWLite execution would then depend on using UFC to suck down the appropriate tarball (which would be properly generated by whichever client and contain the dependencies). Finally, the payload would need to behave a certain API (i.e. generate an FJR-esque file that reports the status, which can be parsed by wmagent).
ewv: Are you saying you want to accept this ticket? I can pass you the information I have on the requisite VarParsing stuff of configuration.
meloam: Send it my way.
adding for the record, a mail thread with Xin Shi with his example of how to make FWLite work.
Dear all,
Sorry that I should express more clearly in my procedure. In fact:
cd UserCode/llvv_fwk/test/phojet/
there you will find the crabConfig.py
cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis . The above step is to copy the binary "runPhoJetAnalysis" to the working directory.
If you follow the procedure I prescribed, the only thing you need to change is the T2 storage. All the rest should be working out of box.
Thanks! Xin
On Thu, Dec 18, 2014 at 8:44 AM, Stefano Belforte stefano.belforte@cern.ch wrote:
yeah.. somehow the "scriptExe" iteself is missing in this :-)
Xin, please note that nor me nor Marco ever ran fwLite and we
really have no clue at even how to start it !
stefano
On 12/18/2014 02:37 PM, Marco Mascheroni wrote:
Thanks,
may I also ask you to share the crabConfig.py and the pset you are
using please? I would like to successfully replicate what you are
doinf using no brain, and once it works start looking at what you /
discuss with Eric/Stefano et al. how to include it in CRAB3.
cheers, and thanks again.
Marco
On Thu, Dec 18, 2014 at 2:29 PM, Xin Shi <Xin.Shi@cern.ch> wrote:
Dear all,
Thanks for your suggestions. Here is a recipe I'm using:
cmsrel CMSSW_7_2_2
cd CMSSW_7_2_2/src/
cmsenv
git clone git@github.com:veelken/SVfit_standalone
TauAnalysis/SVfitStandalone
git clone git@github.com:quertenmont/2l2v_fwk.git UserCode/llvv_fwk
cd UserCode/llvv_fwk
git checkout remotes/origin/72Xfwk
cd ../..
scram b -j8
cd UserCode/llvv_fwk/test/phojet/
cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis .
Then edit the crabConfig.py to your needs.
crab sub
Please let me know if you have further questions.
Thanks.
Xin
- - - - - - - - - - - - - - - - - - - -..- .. -.
Xin Shi
Postdoctoral Research Associate
Purdue University
Xin.Shi@cern.ch
+1-617-744-9468
Skype/GTalk: shixin111
... .... .. - - - - - - - - - - - - - - - - - - -
On Thu, Dec 18, 2014 at 3:30 AM, Stefano Belforte <stefano.belforte@cern.ch>
wrote:
I see that MArco already wrote a more precise
description of the environment issue with cmsRun/scriptExe
and basically says same things as I tried to say here.
Stefano
On 12/18/2014 09:22 AM, Stefano Belforte wrote:
Hi Eric,
No, I was much more simple in my worry. See below.
On 12/17/2014 09:04 PM, Eric Vaandering wrote:
I’m not 100% sure I understand what Stefano is saying, so let me give my
interpretation: scriptExe
does not try to set up a particular CMSSW release, so it’s left to
users? Certainly I can imagine
scripts that people might want to run with CRAB outside of the CMSSW
environment and then others
that you want to do inside. For instance:
outside: something like the Higgs fit or something else very late in the
game
inside: FWLite (obviously) but even bare root macros probably want to
use the CMSSW version of
root. So maybe we want to imagine a hierarchy of plugins a bit like
scriptExe
scriptExeWithScram
FWLite
PyRoot
Doing as much as possible inside the WMCore WMStep framework may have
payoffs down the road for
defining other kinds of workflows in production that we don’t currently
have.
My concern is very basic:
Current wrapper uses two different pieces of code to setup
the environment before launching cmsRUn (and here it is e.g.
passing on the proxy) and before launching user's script
(and here it is not pasing $X509_USER_PROXY).
I say that rather than fixing the second
piece of code, we should use the same configuration.
Similarly I pointed out to Marco that there are already
two places in the same source file where the same lines
are present, which can also lead to undesired divergences etc.
Simply code must be more clean, and it should be clear
(to developers and users) which environment is setup (and how)
before a certain action is attempted.
Then of course one can imagine the further steps you indicate,
but I have some prejudice that anything users may want to do
should be done inside some CMSSW release. If nothing else,
to have a definite environment. Depending on locally installed
version of python or gcc on the WN of the day may be dangerous.
So let's leave the scriptExeWithoutScram for when there is
a demonstrated use case.
Bottom line, tell Xin:
you should write a script that :
a. takes these arguments...
b. can find this environment ....
c. can find these files in the cwd .... on directory $...
d. if needed can do scram unsetenv and then find the grid client tools
....
e. needs to end with exit codes ...
f. needs to leave these files in the cwd with this content ....
then it will all be a piece of cake. But currently
only a. is clear (maybe).
Stefano
Dear all,
Here is an updated recipe which contains uploading user's grid certificate info and can be run on T2_CERN site as well. Note the git repo has changed to my area:
cmsrel CMSSW_7_2_2
cd CMSSW_7_2_2/src/
cmsenv
git clone git@github.com:veelken/SVfit_standalone
TauAnalysis/SVfitStandalone
git clone git@github.com:xshi/2l2v_fwk.git UserCode/llvv_fwk
cd UserCode/llvv_fwk
git checkout remotes/origin/mini-aod
cd ../..
scram b -j8
cd UserCode/llvv_fwk/test/phojet/
cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis .
voms-proxy-init -voms cms -valid 168:0 --out x509_proxy
Then edit the crabConfig.py to your needs.
crab sub
Please let me know if you have further questions.
Happy New Year! Xin
Just had a discussion with Marco about this after looking at some code. It looks like writing a WMStep, Executor and Template would be fairly difficult to get right, and I'm not sure that the complexity would be worth the effort. Unifying what we have for CMSSW and scriptExe
currently (i.e. calling cmsRun
manually) would greatly reduce the code complexity.
Looking @xshi's setup, we already pass through the proxy in the current code in git, and bits that are really missing for FWLite support would be removing the requirement of the job report file and passing the input files and mask in a sensible way (or just require file based splitting). Then I would also think that the python libraries should be included in the sandbox.
Looking to other requests people had, the benefit of making a WMStep for FWLite processes is it adds the ability to trivially run multiple analysis steps in one job.
To be fair, writing an executor does not seem to hard, I'm more confused by the complexity of CMSSWStepHelper
. What would the FWLite requirements be from your side?
I should also note that it looks to me like we're currently creating 3 release areas for one execution of the CMS run script, 2 of which seem to be created by WMStep…
Hi Matthias-
I think there's not a lot of requirements, per se. There's always just the general idea of wanting to make "easy things easy and hard things possible". FWLite (or even any arbitrary scripts) would be to have a contract between the grid side of "we give you arguments this way" and "you return an FJR-esque thing back to us". If that contract existed, it would be easier then for someone to do the next logical thing and provide "standard" boilerplate for analyses to use.
For instance, in PyFWLite, you do the following:
from DataFormats.FWLite import Events
files = ['file1.root', 'file2.root']
events = Events(files)
for event in events:
# analysis stuff here
With a well-defined contract, Events and the C++ equivalent could be extended readily to accept an arbitrary input sequence instead of just a list of file names. A similar argument could be made on the backside for scripts to report back information about their outputs. A series of defined helper functions would go a long way to making it viable to have arbitrary executables be first class citizens.
I should also note that it looks to me like we're currently creating 3 release areas for one execution of the CMS run script, 2 of which seem to be created by WMStep…
The bonus of fixing existing code instead of starting from scratch is that you'd also benefit the other users of the existing code. There's been gigahours of WMStep running time at this point, a lot of the subtle kinks have been worked out over the course of however many years (4? 5?) the code's been in production. If there's 3 release areas for a single CMSSW job, then that would be a new wrinkle that could be fixed and benefit everyone in the process.
FWIW, cmsswStepHelper
is just a series of helper functions around a ConfigurationSection object. The idiom with ConfigurationSections is that the actual configuration itself is bone-simple. You can't even set arbitrary objects as values in there, much less have each sub-bit of the configuration tree have custom methods. cmsswStepHelper
is just a placeholder for how you would manipulate a ConfigurationSection object. Since FWLite scripts are a different ball of wax, you don't have to worry about reimplementing it. For instance:
Doesn't make any sense for FWLite.
Hi Andrew,
Yes, a well-defined contract definitely seems to be missing here. I'm not sure what can be done on the side of inputs, my intuition would be to provide some text/JSON files to exchange the information rather than shoving everything into the argument list. As far as the FJR-ish output goes, I think that would be a lot shorter than what cmsRun
is currently producing. Maybe a skeleton would have to be provided by the step, with some bits to be filled in with user-supplied data.
As far as different WMSteps
go, everytime I think about this issue, in terms of running distributed computing, it would be nice to even go back another step. Rather than duplicating the CMSSW setup in the CMSSW step for FWLite/arbitrary executables, I think it would be good to have a common setup for CMSSW and then layer the bits for cmsRun
and FWLite/arbitrary scripts on top of that. And getting that right will take a while…
As far as the helper goes, I see how an FWLite-equivalent would be very lightweight. But I also haven't fully grokked the call-structure around the helpers, either. Reading your description, I would think that the configuration is equivalent to a C-struct, and the helper is the collection of manipulation functions for the struct (thinking pure C, that is).
IIUC that down the road you want to make it possible to run multiple steps in the same job, it may be better to keep the freedom to use different CMSSW versions in the different steps. All in all cmsrel is prettty quick when compared to "hours". Not clear that it will be ever needed, but cost is low.
@belforte - Since each CMSSW WMStep does a cmsrel seperately, you can already use different versions of CMSSW within the same job.
I agree that there should be some sorta JSON for the input, with the filename passed in via argument. But looking ahead to when people are actually using it, it'll probably save a lot of support email struggle to have a "blessed" interface between that file and the FWLite event loop, so people aren't rolling their own, failing, and then getting upset when their statistics are weird. The output stuff will probably end up being just the things that are needed from stageout and a couple bits for error code propagation.
Adding label question and moving to October. That's because after talking with Dima/Stefano and others it was concluded that probably improving scriptExe
and documenting how to use scriptExe
for this use case is enough.
realistically: NO NEW FEATURES
At some point we need to support FWLite jobs in addition to cmsRun. CRAB2 does not really support this except through writing a custom script, which is too difficult, especially for the target FWLite user.
Not assigning a milestone to this.