dmwm / CRAB2

CRAB2
2 stars 11 forks source link

CMSSW_x.stdout on glidein submission nodes is now 0 bytes #919

Closed ericvaandering closed 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 102402 reported by belforte on Thu Aug 29 02:33:30 2013.

I noticed that glidemon (aka Dima's monitor) points to an empty file for the jobs stdout (CMSSW_x.stdout is always zero bytes). I suspect this is due to a change in condor version (we update schedd's lately) so that the JDL istructions that we were sending to prevent duplicate stdout (explicit file + copy inside tarball) now works. I'd like to hear from Igor/James confirmation of this and be told how exactly to modify the JDL so that we do explicetly ask for and obtain that stdout is also retrieved outside the zipped tarball.

Since we are needing a patch to CRAB_2_9_0 anyhow to fix the problem with T1 black listing that Fede found, I'd like to do this at same time.

Thanks stefano

ericvaandering commented 10 years ago

Comment by dmytro on Thu Sep 5 06:59:49 2013

Hi Stefano,

can you please give a link to a few examples?

Thanks, Dima

ericvaandering commented 10 years ago

Comment by belforte on Thu Sep 5 07:25:24 2013

e.g. http://glidemon.web.cern.ch/glidemon/jobs.php?taskid=54515 but this is not your problem, files are empty on disk. ALl of them. THis is a problem in crab JDL

by the way seems to me you only put a link to log for failed jobs, correct ? I found some failed jobs with no link, is that worth investigating ? e.g. http://glidemon.web.cern.ch/glidemon/jobs.php?taskid=54531 jobs 35 (or FastFilter for failed)

ericvaandering commented 10 years ago

Comment by dmytro on Thu Sep 5 08:02:36 2013

XML files are not empty, so at least some information is available. Indeed I see no issues on my side.

Regarding http://glidemon.web.cern.ch/glidemon/jobs.php?taskid=54531 there is a red warning message saying that a user is using an old version of Crab that doesn't set permissions right. I determine this status once per task, i.e. for a new task I check failed jobs and try to access its logs, if it fails, I mark the task as "bad" and give the red warning. For all other jobs from the task, I don't even try to access the logs.

Dima

ericvaandering commented 10 years ago

Comment by belforte on Thu Sep 5 08:11:59 2013

yes, crab*xml works OK. I know what's going on here. It is an effect of condor not respecting some JDL request in certain versions. I simply want to hear confirmation from Igor that I am doing the right thing.

ericvaandering commented 10 years ago

Comment by belforte on Thu Sep 5 08:37:40 2013

better now http://submit-6.t2.ucsd.edu/CSstoragePath/36/uscms2182/belforte_crab_0_130905_144040_s6v17c/CMSSW_1.stdout

ericvaandering commented 10 years ago

Comment by belforte on Thu Sep 5 08:39:44 2013

/local/reps/CMSSW/COMP/PRODCOMMON/src/python/ProdCommon/BossLite/Scheduler/SchedulerRemoteglidein.py,v <-- SchedulerRemoteglidein.py new revision: 1.32; previous revision: 1.31

tagged this as : PRODCOMMON_0_12_18_CRAB_57

belforte commented 10 years ago

fix released in CRAB_2_9_1