dmwm / CRAB2

CRAB2
2 stars 11 forks source link

Support local condor. #1139

Open geonmo opened 10 years ago

geonmo commented 10 years ago

Dear admin, I updated to use condor scheduler on CVMFS system. I tested at local machine and checked cmssw.sh file is created correctly.

Could you merge this commit?

belforte commented 10 years ago

local condor support was put in specifically for FNAL use and maintained by Eric Vaandering there. I will not risk breaking it by calling /cvmfs/cms.cern,ch/cmsser_default.sh w/o a request from them. Did you look into defining VO_CMS_SW_DIR on your local system ?

samircury commented 10 years ago

Very interesting discussion/patch.

I'm supporting local condor at Caltech too. And what I needed to do is to have a local CRAB copy, where I had to patch PRODCOMMON with a one-line patch, which inserts something into the Condor submit file :

    jdl += 'getenv = True'

So the user is free to set whatever works in his/her environment and Condor jobs will inherit the same supposedly-working settings. If the submit node is uniform with worker nodes.

With that, it works fine for me. I guess that it would be less sensitive to other things. Concerning support at FNAL I'm forwarding an email to Stefano with a recent discussion. Might help deciding what to do here.

The advantage of using that upstream is that users will be able to use CRAB straight out of CVMFS and have the same workflow in our T3, lxplus, anywhere CVMFS-enabled.

So if you're interested I could provide the formal patch for that.

belforte commented 10 years ago

Samir, you can also direct your users to put that via the crab configuration file: http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/Docs/crab-online-manual.html#additional_jdl_parameters__glite__remoteglidein_

so you do not need any patch. Will that have the needed flexibility to accomodate anybody ?

Of course if @ericvaandering says that they are happy to use /cvmfs at fnal I have no problem in acceptin Geonmo's patch

samircury commented 10 years ago

Stefano,

It might work. I remember seeing it somewhere but never found it again. Thanks for the pointer. Even though the docs say "Works both for gLite and remoteGlidein" I think I've seen support in the Condor code too. Will try that. Thanks!

belforte commented 10 years ago

ah.. correct. I think it will not work for local condor :-( but should not be difficult to add it.

On 07/21/2014 07:07 PM, samircury wrote:

Stefano,

It might work. I remember seeing it somewhere but never found it again. Thanks for the pointer. Even though the docs say "Works both for gLite and remoteGlidein" I think I've seen support in the Condor code too. Will try that. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/dmwm/CRAB2/pull/1139#issuecomment-49634703.

Stefano Belforte - I.N.F.N. tel : +39 040 375-6261 (fax: 375-6258) Area di Ricerca - Padriciano 99 tel mobile: +39 328 010 7327 34012 TRIESTE TS - Italy AIM: stefanobelforte

samircury commented 10 years ago

Having a hard time imagining how things are passed along though. Does this look a correct configuration for what I want to achieve :

[GRID] additional_jdl_parameters = "getenv = True"

Should be easy to add for the people that know the code for long/wrote it :-) Any pointers are welcome. If I manage to include support to that will that be politically feasible to merge it?

belforte commented 10 years ago

you need a semicolon at the end: additional_jdl_parameters = "getenv=True;"

If needed I'll look into adding it to local condor scheduler

ericvaandering commented 10 years ago

In principle, I see nothing wrong with @geonmo 's patch. But we would want to test it at FNAL first.

I might be able to get to it later this week.

geonmo commented 10 years ago

Hello, everyone. I am glad to your responses about this patch. Frankly speaking, I made this code to use local CRAB at T3_KR_KISTI. ( To make condor wrapper is very difficult for me. ) If this patch is merged or not, we only need to operate local CRAB at our T3. "VO_CMS_SW_DIR " is only set for EGEE sites(not OSG) and it did not operate for condor. ( condor is missing at if-else statement. )

geonmo commented 9 years ago

Hello, everyone. Do you have a progress about test on FNAL?

ericvaandering commented 9 years ago

I have not. Thanks for the reminder.

ericvaandering commented 9 years ago

I think at this late stage I'm not willing to add a whole section of code for condor and risk things at FNAL. You are welcome to create a patch and apply it to what you build, of course. This will not change anymore, so it is a single patch and a command to apply it.

Let me tell you how we get around this at FNAL: we have people put this single line: source /uscmst1/prod/sw/cms/shrc prod into their .profile file. That's shared on NFS and gets run on the worker node doing the setup that way.

I suspect your way is better, but at this point, it's not worth the effort and risk that something goes wrong at FNAL.

Eric