dmwm / CRABServer

15 stars 38 forks source link

WMStep for FWLite jobs #701

Closed ericvaandering closed 5 years ago

ericvaandering commented 14 years ago

At some point we need to support FWLite jobs in addition to cmsRun. CRAB2 does not really support this except through writing a custom script, which is too difficult, especially for the target FWLite user.

Not assigning a milestone to this.

ericvaandering commented 13 years ago

ewv: Notes from a short discussion at FNAL with Chris Jones, Sal Rappaccio, Sudhir Malik:

Chris, Sal, Sudhir, and I talked about a few topics. These are my notes. Please correct any mistakes.

1) Being able to supply command line arguments to the cmssw_cfg.py config file is extremely useful. CRAB3 does not currently support this. It has to be added.

2) Almost no one uses FWLite in any of its incarnations, but enough people are running PyRoot based analyses on batch systems that it's likely that CRAB3 being able to run this kind of workflow would be useful.

3) If we enable #2, then PyFWLite is just another flavor of that and can be done easily.

4) VarParsing as a way to configure cmssw_cfg.py or PyFWLite/PyRoot jobs should die. The python argparse module can do everything VarParsing does. 4a) argparse can be added to python 2.6 so we should do this for the current CMSSW 4b) argparse is part of python 2.7; CMSSW 5 should move to python 2.7

5) In the event that we do #2, the only way pass command line args from CRAB to PyRoot/PyFWLite will be with a PSet based configuration (with a few standard parameters). The PSet will be responsible for translating command line parameters to PSet settings, presumably through argparse

6) Configuration in CRAB3 is through python. All commands like submit/status will return python data structures. It will be interesting to know how useful some sort of multicrab-like interface for configuring and interacting with CRAB will be.

7) We should survey users on #2, #5, and #6 at some point. Sudhir and/or Sal should interact with Ian Fisk to see how that survey might fit into existing user surveys and the timing w.r.t. the soon-to-start power user tests of CRAB3.

Eric

PerilousApricot commented 12 years ago

meloam: I want this ticket (I need it for my analysis). The CMSSW step can first be refactored to pull out the common code (which is about 99% of it), FWLite execution would then depend on using UFC to suck down the appropriate tarball (which would be properly generated by whichever client and contain the dependencies). Finally, the payload would need to behave a certain API (i.e. generate an FJR-esque file that reports the status, which can be parsed by wmagent).

ericvaandering commented 12 years ago

ewv: Are you saying you want to accept this ticket? I can pass you the information I have on the requisite VarParsing stuff of configuration.

PerilousApricot commented 12 years ago

meloam: Send it my way.

belforte commented 9 years ago

adding for the record, a mail thread with Xin Shi with his example of how to make FWLite work.

Dear all,

Sorry that I should express more clearly in my procedure. In fact:

cd UserCode/llvv_fwk/test/phojet/

there you will find the crabConfig.py

cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis . The above step is to copy the binary "runPhoJetAnalysis" to the working directory.

If you follow the procedure I prescribed, the only thing you need to change is the T2 storage. All the rest should be working out of box.

Thanks! Xin

On Thu, Dec 18, 2014 at 8:44 AM, Stefano Belforte stefano.belforte@cern.ch wrote:

yeah.. somehow the "scriptExe" iteself is missing in this :-)
Xin, please note that nor me nor Marco ever ran fwLite and we
really have no clue at even how to start it !
stefano

On 12/18/2014 02:37 PM, Marco Mascheroni wrote:

    Thanks,

    may I also ask you to share the crabConfig.py and the pset you are
    using please? I would like to successfully replicate what you are
    doinf using no brain, and once it works start looking at what you /
    discuss with Eric/Stefano et al. how to include it in CRAB3.

    cheers, and thanks again.

    Marco

    On Thu, Dec 18, 2014 at 2:29 PM, Xin Shi <Xin.Shi@cern.ch> wrote:

        Dear all,

        Thanks for your suggestions. Here is a recipe I'm using:

        cmsrel CMSSW_7_2_2
        cd CMSSW_7_2_2/src/
        cmsenv
        git clone git@github.com:veelken/SVfit_standalone
        TauAnalysis/SVfitStandalone
        git clone git@github.com:quertenmont/2l2v_fwk.git UserCode/llvv_fwk
        cd UserCode/llvv_fwk
        git checkout remotes/origin/72Xfwk
        cd ../..
        scram b -j8

        cd UserCode/llvv_fwk/test/phojet/
        cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis .

        Then edit the crabConfig.py to your needs.
        crab sub

        Please let me know if you have further questions.

        Thanks.
        Xin

        - - - - - - - - - - - - - - - - - - -  -..- .. -.
        Xin Shi
        Postdoctoral Research Associate
        Purdue University
        Xin.Shi@cern.ch
        +1-617-744-9468
        Skype/GTalk: shixin111
        ... .... .. - - - - - - - - - - - - - - - - - - -

        On Thu, Dec 18, 2014 at 3:30 AM, Stefano Belforte <stefano.belforte@cern.ch>
        wrote:

            I see that MArco already wrote a more precise
            description of the environment issue with cmsRun/scriptExe
            and basically says same things as I tried to say here.
            Stefano

            On 12/18/2014 09:22 AM, Stefano Belforte wrote:

                Hi Eric,
                No, I was much more simple in my worry. See below.

                On 12/17/2014 09:04 PM, Eric Vaandering wrote:

                    I’m not 100% sure I understand what Stefano is saying, so let me give my
                    interpretation: scriptExe
                    does not try to set up a particular CMSSW release, so it’s left to
                    users? Certainly I can imagine
                    scripts that people might want to run with CRAB outside of the CMSSW
                    environment and then others
                    that you want to do inside. For instance:

                    outside: something like the Higgs fit or something else very late in the
                    game
                    inside: FWLite (obviously) but even bare root macros probably want to
                    use the CMSSW version of
                    root. So maybe we want to imagine a hierarchy of plugins a bit like

                    scriptExe
                         scriptExeWithScram
                             FWLite
                             PyRoot

                    Doing as much as possible inside the WMCore WMStep framework may have
                    payoffs down the road for
                    defining other kinds of workflows in production that we don’t currently
                    have.

                My concern is very basic:
                Current wrapper uses two different pieces of code to setup
                the environment before launching cmsRUn (and here it is e.g.
                passing on the proxy) and before launching user's script
                (and here it is not pasing $X509_USER_PROXY).
                I say that rather than fixing the second
                piece of code, we should use the same configuration.
                Similarly I pointed out to Marco that there are already
                two places in the same source file where the same lines
                are present, which can also lead to undesired divergences etc.

                Simply code must be more clean, and it should be clear
                (to developers and users) which environment is setup (and how)
                before a certain action is attempted.

                Then of course one can imagine the further steps you indicate,
                but I have some prejudice that anything users may want to do
                should be done inside some CMSSW release. If nothing else,
                to have a definite environment. Depending on locally installed
                version of python or gcc on the WN of the day may be dangerous.

                So let's leave the scriptExeWithoutScram for when there is
                a demonstrated use case.

                Bottom line, tell Xin:

                you should write a script that :
                a. takes these arguments...
                b. can find this environment ....
                c. can find these files in the cwd .... on directory $...
                d. if needed can do scram unsetenv and then find the grid client tools
                ....
                e. needs to end with exit codes ...
                f. needs to leave these files in the cwd with this content ....

                then it will all be a piece of cake. But currently
                only a. is clear (maybe).

                Stefano
xshi commented 9 years ago

Dear all,

Here is an updated recipe which contains uploading user's grid certificate info and can be run on T2_CERN site as well. Note the git repo has changed to my area:

    cmsrel CMSSW_7_2_2
    cd CMSSW_7_2_2/src/
    cmsenv
    git clone git@github.com:veelken/SVfit_standalone
    TauAnalysis/SVfitStandalone
    git clone git@github.com:xshi/2l2v_fwk.git UserCode/llvv_fwk
    cd UserCode/llvv_fwk
    git checkout remotes/origin/mini-aod
    cd ../..
    scram b -j8

    cd UserCode/llvv_fwk/test/phojet/
    cp $CMSSW_BASE/bin/$SCRAM_ARCH/runPhoJetAnalysis .
    voms-proxy-init -voms cms -valid 168:0 --out x509_proxy

    Then edit the crabConfig.py to your needs.
    crab sub

Please let me know if you have further questions.

Happy New Year! Xin

matz-e commented 9 years ago

Just had a discussion with Marco about this after looking at some code. It looks like writing a WMStep, Executor and Template would be fairly difficult to get right, and I'm not sure that the complexity would be worth the effort. Unifying what we have for CMSSW and scriptExe currently (i.e. calling cmsRun manually) would greatly reduce the code complexity.

Looking @xshi's setup, we already pass through the proxy in the current code in git, and bits that are really missing for FWLite support would be removing the requirement of the job report file and passing the input files and mask in a sensible way (or just require file based splitting). Then I would also think that the python libraries should be included in the sandbox.

PerilousApricot commented 9 years ago

Looking to other requests people had, the benefit of making a WMStep for FWLite processes is it adds the ability to trivially run multiple analysis steps in one job.

matz-e commented 9 years ago

To be fair, writing an executor does not seem to hard, I'm more confused by the complexity of CMSSWStepHelper. What would the FWLite requirements be from your side?

I should also note that it looks to me like we're currently creating 3 release areas for one execution of the CMS run script, 2 of which seem to be created by WMStep…

PerilousApricot commented 9 years ago

Hi Matthias-

I think there's not a lot of requirements, per se. There's always just the general idea of wanting to make "easy things easy and hard things possible". FWLite (or even any arbitrary scripts) would be to have a contract between the grid side of "we give you arguments this way" and "you return an FJR-esque thing back to us". If that contract existed, it would be easier then for someone to do the next logical thing and provide "standard" boilerplate for analyses to use.

For instance, in PyFWLite, you do the following:

from DataFormats.FWLite import Events
files = ['file1.root', 'file2.root']
events = Events(files)
for event in events:
    # analysis stuff here

With a well-defined contract, Events and the C++ equivalent could be extended readily to accept an arbitrary input sequence instead of just a list of file names. A similar argument could be made on the backside for scripts to report back information about their outputs. A series of defined helper functions would go a long way to making it viable to have arbitrary executables be first class citizens.

I should also note that it looks to me like we're currently creating 3 release areas for one execution of the CMS run script, 2 of which seem to be created by WMStep…

The bonus of fixing existing code instead of starting from scratch is that you'd also benefit the other users of the existing code. There's been gigahours of WMStep running time at this point, a lot of the subtle kinks have been worked out over the course of however many years (4? 5?) the code's been in production. If there's 3 release areas for a single CMSSW job, then that would be a new wrinkle that could be fixed and benefit everyone in the process.

FWIW, cmsswStepHelper is just a series of helper functions around a ConfigurationSection object. The idiom with ConfigurationSections is that the actual configuration itself is bone-simple. You can't even set arbitrary objects as values in there, much less have each sub-bit of the configuration tree have custom methods. cmsswStepHelper is just a placeholder for how you would manipulate a ConfigurationSection object. Since FWLite scripts are a different ball of wax, you don't have to worry about reimplementing it. For instance:

https://github.com/dmwm/WMCore/blob/b14ef269d6acddb15b75770b32e6ac1c47a4d379/src/python/WMCore/WMSpec/Steps/Templates/CMSSW.py#L111

Doesn't make any sense for FWLite.

matz-e commented 9 years ago

Hi Andrew,

Yes, a well-defined contract definitely seems to be missing here. I'm not sure what can be done on the side of inputs, my intuition would be to provide some text/JSON files to exchange the information rather than shoving everything into the argument list. As far as the FJR-ish output goes, I think that would be a lot shorter than what cmsRun is currently producing. Maybe a skeleton would have to be provided by the step, with some bits to be filled in with user-supplied data.

As far as different WMSteps go, everytime I think about this issue, in terms of running distributed computing, it would be nice to even go back another step. Rather than duplicating the CMSSW setup in the CMSSW step for FWLite/arbitrary executables, I think it would be good to have a common setup for CMSSW and then layer the bits for cmsRun and FWLite/arbitrary scripts on top of that. And getting that right will take a while…

As far as the helper goes, I see how an FWLite-equivalent would be very lightweight. But I also haven't fully grokked the call-structure around the helpers, either. Reading your description, I would think that the configuration is equivalent to a C-struct, and the helper is the collection of manipulation functions for the struct (thinking pure C, that is).

belforte commented 9 years ago

IIUC that down the road you want to make it possible to run multiple steps in the same job, it may be better to keep the freedom to use different CMSSW versions in the different steps. All in all cmsrel is prettty quick when compared to "hours". Not clear that it will be ever needed, but cost is low.

PerilousApricot commented 9 years ago

@belforte - Since each CMSSW WMStep does a cmsrel seperately, you can already use different versions of CMSSW within the same job.

I agree that there should be some sorta JSON for the input, with the filename passed in via argument. But looking ahead to when people are actually using it, it'll probably save a lot of support email struggle to have a "blessed" interface between that file and the FWLite event loop, so people aren't rolling their own, failing, and then getting upset when their statistics are weird. The output stuff will probably end up being just the things that are needed from stageout and a couple bits for error code propagation.

mmascher commented 9 years ago

Adding label question and moving to October. That's because after talking with Dima/Stefano and others it was concluded that probably improving scriptExe and documenting how to use scriptExe for this use case is enough.

belforte commented 5 years ago

realistically: NO NEW FEATURES