dmwm / CRAB2

CRAB2
2 stars 11 forks source link

GridJonId for glidein is not unique #805

Closed ericvaandering closed 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 97128 reported by belforte on Tue Aug 28 17:57:10 2012.

pasting here the mail exchange... we need to make it unique to avoid old and new jobs to appear the same to dashboard

Dear all,

Unique GridJobId was always à requirement for Dashboard reporting, as well as à unique task id in the CMS scope

I'll check tomorrow How many jobs with screwed up info we have in Dashboard.

Cheers

Julia


From: Stefano Belforte [stefano.belforte@cern.ch] Sent: 28 August 2012 21:05 To: Igor Sfiligoi Cc: Sanjay Padhi; Domenico Giordano; Daniele Spiga; jletts@ucsd.edu; fkw@ucsd.edu; Sfiligoi Igor; Julia Andreeva Subject: Re: question about crab task

thanks Igor ! well there could be "that test week" but we can live with some messup at that time. stefano

On 08/28/2012 09:01 PM, Igor Sfiligoi wrote: > How about > https://glidein-2.t2.ucsd.edu/YYMM/&lt;CondorID&gt; > ? > > If we re-install Condor node more than once a month, we have a BIG problem. > > Igor > > On 08/28/2012 11:57 AM, Stefano Belforte wrote: >> that's a good creative idea >> I think 3 or 4 char will do to make submit time unique >> withing one condor instance, I hate horribly long un-human strings >> >> OK. will put it in when we get a good chance. >> And we'll be careful not to reinstall condor in production >> server before it is done. >> >> Domenico, Daniele, Julia, was this something like one or few >> tasks with bad data, or do we have a big crisis like all >> tasks are getting mix of 2012 and 2009 data in ? >> >> >> stefano >> >> >> On 08/28/2012 08:44 PM, Igor Sfiligoi wrote: >>> How about >>> https://glidein-2.t2.ucsd.edu/&lt;submit time>/<CondorID> >>> ? >>> >>> Igor >>> >>> On 08/28/2012 11:39 AM, Stefano Belforte wrote: >>>> yes sounds we do. It's a pity because having condorId in >>>> the dashboard monitoringId was convenient >>>> but unless we get some really creative idea... >>>> >>>> Could we also customize condorId ? and insert some GUID there >>>> sort of gLiteWMS style ? >>>> >>>> stefano >>>> >>>> On 08/28/2012 08:02 PM, Sanjay Padhi wrote: >>>>> b) Is this happening because we reinstalled the condor. So the ID >>>>> again >>>>> starts from 0 and now when the ID is say 507666.0 >>>>> to be more specific https://glidein-2.t2.ucsd.edu//507666.0, it >>>>> assigns >>>>> back to the old submitted task instead of the new task >>>>> by a given user. >>>>> >>>>> I think the bottom line is: >>>>> >>>>> a) We need unique ID from the crabserver >>> >

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 3 09:57:09 2012

easiest seems to modify for all condor based schedulers in BossLite/Scheduler/SchedulerCondorCommon.py adding the "submitday" string to the schedulerId

from condorID = self.hostname + "//" \

to condorID = self.hostname + "/" \

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 10 09:46:20 2012

so I think I have done this and it works with test jobs on submit-2, time to try on production servers and to verify with local condor at LPC.

Changed files are:

BossLite/Scheduler/SchedulerCondorCommon.py CrabServerWorker/FatWorker.py

no change in $CRABPYTHON for glidein

for local condor will need to change some other place to send to BLite/ScheedulerCondorCommon the modified JDL with the submission day

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 10 10:08:03 2012

the new FatWorker is already in CVS as version 1.231

http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/COMP/CRAB/CRABSERVER/src/python/CrabServerWorker/FatWorker.py?r1=1.230&amp;r2=1.231

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 10 10:17:54 2012

committed also CondorCommon: /local/reps/CMSSW/COMP/PRODCOMMON/src/python/ProdCommon/BossLite/Scheduler/SchedulerCondorCommon.py,v <-- SchedulerCondorCommon.py new revision: 1.70; previous revision: 1.69

allin CVS now

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 10 10:50:13 2012

fixed versions now in production on submit-3

ericvaandering commented 10 years ago

Comment by belforte on Mon Sep 17 05:48:19 2012

need to handle both old and new schedulerId to deploy on servers with ongoing tasks

ericvaandering commented 10 years ago

Comment by belforte on Wed Oct 3 03:16:33 2012

handling of both formats put in BossLite?/Scheduler/SchedulerCondorCommon.py fix deployed in CrabServer 1_1_7

ericvaandering commented 10 years ago

Closed by belforte on Wed Oct 3 03:16:33 2012