dmwm / CRABServer

15 stars 38 forks source link

HG1502 Validation of 3.3.13 #4639

Closed juztas closed 9 years ago

juztas commented 9 years ago

Validation twiki : https://twiki.cern.ch/twiki/bin/viewauth/CMS/CRAB3_ASO_1502 Issues will be posted here which are required to fix

juztas commented 9 years ago

TW cannot upload warning message to crabserver, because it tries to upload not with users proxy but with service certificate

[21/Jan/2015:11:58:47]  RESTSQL:hLUbKodSIvAP execute: () {'taskname': '150121_105716_crab3test-5:jbalcas_crab_HG1502-1-Skimming_dataset_lumi_based-L-T_O-T_P-T_IL-F'}
[21/Jan/2015:11:58:47]  ERROR: authz denied for user '{'dn': u'/DC=ch/DC=cern/OU=computers/CN=tw/vocms244.cern.ch', 'login': 'tw@vocms244.cern.ch', 'method': 'X509Cert', 'roles': {'operator': {'group': set(['crab3']), 'site': set([])}}, 'name': u'vocms244.cern.ch Service'}' to the resource '150121_105716_crab3test-5:jbalcas_crab_HG1502-1-Skimming_dataset_lumi_based-L-T_O-T_P-T_IL-F. Resource belong to jbalcas'
[21/Jan/2015:11:58:47]  SERVER DATABASE ERROR 500/403 Execution error cherrypy._cperror.HTTPError ec157a14b9fcec1e5a9effd3d6db0616 [instance: preprod] ((403, 'You are not allowed to access this resource.')); last statement: SELECT tm_username FROM tasks WHERE tm_taskname=:taskname; binds: (), {'taskname': '150121_105716_crab3test-5:jbalcas_crab_HG1502-1-Skimming_dataset_lumi_based-L-T_O-T_P-T_IL-F'}; offset: None
[21/Jan/2015:11:58:47]    Traceback (most recent call last):
[21/Jan/2015:11:58:47]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/WMCore/REST/Server.py", line 1728, in dbapi_wrapper
[21/Jan/2015:11:58:47]        return handler(*xargs, **xkwargs)
[21/Jan/2015:11:58:47]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/RESTTask.py", line 77, in post
[21/Jan/2015:11:58:47]        return getattr(RESTTask, subresource)(self, **kwargs)
[21/Jan/2015:11:58:47]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/RESTTask.py", line 92, in addwarning
[21/Jan/2015:11:58:47]        authz_owner_match(self.api, [workflow], self.Task) #check that I am modifying my own workflow
[21/Jan/2015:11:58:47]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/RESTExtensions.py", line 38, in authz_owner_match
[21/Jan/2015:11:58:47]        raise cherrypy.HTTPError(403, "You are not allowed to access this resource.")
[21/Jan/2015:11:58:47]    HTTPError: (403, 'You are not allowed to access this resource.')
[21/Jan/2015:11:58:47] vocms0133.cern.ch 128.142.142.23 "POST /crabserver/preprod/task HTTP/1.1" 500 [data: 4081 in 726 out 67254 us ] [auth: OK "/DC=ch/DC=cern/OU=computers/CN=tw/vocms244.cern.ch" "" ] [ref: "" "CRABClient/0.0.0" ]
mmascher commented 9 years ago

Thanks Justas, that's https://github.com/dmwm/CRABServer/issues/4606 , I have a fix for it but I haven't tested it yet.

juztas commented 9 years ago

ScriptEXE is failing

ERROR: Traceback follows:
Traceback (most recent call last):
  File "CMSRunAnalysis.py", line 840, in <module>
    jobExitCode = executeScriptExe(opts, scram)
  File "CMSRunAnalysis.py", line 559, in executeScriptExe
    ret = scram(command_, runtimeDir = os.getcwd(), logName = 'cmsRun-stdout.log', cleanEnv = False)#logName=subprocess.PIPE) for printing to the stdout
TypeError: __call__() got an unexpected keyword argument 'cleanEnv'
juztas commented 9 years ago

This happens on ITB pool which is unstable. If collector is down/not reporting, we should print good message, whenever schedd choose will be moved to TW, tasks will be submitted to database and not rejected.

[21/Jan/2015:13:10:36]  SERVER DATABASE ERROR 500/403 Execution error __builtins__.ValueError 8c9a7eaabf51cdd0ff11337dccc1c6df [instance: preprod] (need more than 0 values to unpack); last statement: None; binds: None, None; offset: None
[21/Jan/2015:13:10:36]    Traceback (most recent call last):
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/WMCore/REST/Server.py", line 1728, in dbapi_wrapper
[21/Jan/2015:13:10:36]        return handler(*xargs, **xkwargs)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/RESTUserWorkflow.py", line 420, in put
[21/Jan/2015:13:10:36]        scriptexe=scriptexe, scriptargs=scriptargs, scheddname=scheddname, extrajdl=extrajdl, collector=collector)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/DataUserWorkflow.py", line 126, in submit
[21/Jan/2015:13:10:36]        return self.workflow.submit(*args, **kwargs)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/Utils.py", line 132, in wrapped_func
[21/Jan/2015:13:10:36]        return func(*args, **kwargs)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/DataWorkflow.py", line 178, in submit
[21/Jan/2015:13:10:36]        requestname = self.updateRequest('%s_%s_%s' % (timestamp, userhn, workflow), scheddname, backend_urls)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/Utils.py", line 132, in wrapped_func
[21/Jan/2015:13:10:36]        return func(*args, **kwargs)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/CRABInterface/HTCondorDataWorkflow.py", line 47, in updateRequest
[21/Jan/2015:13:10:36]        name = locator.getSchedd().split("@")[0].split(".")[0]
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/HTCondorLocator.py", line 45, in getSchedd
[21/Jan/2015:13:10:36]        schedd = weighted_choice(choices)
[21/Jan/2015:13:10:36]      File "/data/srv/beHG1502b/sw.pre/slc6_amd64_gcc481/cms/crabserver/3.3.13.rc1/lib/python2.6/site-packages/HTCondorLocator.py", line 13, in weighted_choice
[21/Jan/2015:13:10:36]        values, weights = zip(*choices)
[21/Jan/2015:13:10:36]    ValueError: need more than 0 values to unpack
juztas commented 9 years ago

for scriptEXE it fails because TW uses too old version of WMCore. @mmascher can you recreate TW RPM with minimum 1.0.3.pre1 version in which cleanEnv is included ? Thanks

mmascher commented 9 years ago

Thanks guys, I think I lost this change because Iit was never committed in cmsdist: I built a TW RPM in comp.pre.mmascher without committing the change to cmsdist (which is fine provided that you remember the changes).

I am sorry for that.

mmascher commented 9 years ago

cms+crabtaskworker+3.3.13.rc1-comp with the fix for scriptExe available in comp.pre.mmascher.

Jadir, can you deploy it on preprod?

jmarra13 commented 9 years ago

Yes, i can.

On Wed, Jan 21, 2015 at 3:50 PM, Marco Mascheroni notifications@github.com wrote:

cms+crabtaskworker+3.3.13.rc1-comp with the fix for scriptExe available in comp.pre.mmascher.

Jadir, can you deploy it on preprod?

— Reply to this email directly or view it on GitHub https://github.com/dmwm/CRABServer/issues/4639#issuecomment-70849989.

Jadir Marra da Silvajadir@ncc.unesp.br Nucleo de Computacao Cientifica - UNESP R. Dr. Bento Teobaldo Ferraz, 271 - Bldg 2 01140-070 Sao Paulo, SP - Brazil +55 11 3393-7787 (office/voicemail) +55 11 8796-0805 (mobile)


jmarra13 commented 9 years ago

It's installed. I just start the validation script again.

On Wed, Jan 21, 2015 at 3:50 PM, Marco Mascheroni notifications@github.com wrote:

cms+crabtaskworker+3.3.13.rc1-comp with the fix for scriptExe available in comp.pre.mmascher.

Jadir, can you deploy it on preprod?

— Reply to this email directly or view it on GitHub https://github.com/dmwm/CRABServer/issues/4639#issuecomment-70849989.

Jadir Marra da Silvajadir@ncc.unesp.br Nucleo de Computacao Cientifica - UNESP R. Dr. Bento Teobaldo Ferraz, 271 - Bldg 2 01140-070 Sao Paulo, SP - Brazil +55 11 3393-7787 (office/voicemail) +55 11 8796-0805 (mobile)


juztas commented 9 years ago

@AndresTanasijczuk crab getoutput does not work if transferOutputs flag is true

[jbalcas@lxplus0104 src]$ crab status HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/
Warning: Incompatible CRABClient version "3.3.13.rc4" 
Server is saying that compatible versions are: ['3.3.11', '3.3.12', '3.3.13.rc1']
Task name:          150126_101433_crab3test-5:jbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F
Task status:            SUBMITTED
Glidemon monitoring URL:    http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=150126_101433_crab3test-5%3Ajbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F
Dashboard monitoring URL:   http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=jbalcas&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150126_101433_crab3test-5%3Ajbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F
Details:            transferred    29.0% ( 29/100)
                transferring   71.0% ( 71/100)

No publication information (publication has been disabled in the crab configuration file)
[jbalcas@lxplus0104 src]$ crab getoutput HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/
Warning: Incompatible CRABClient version "3.3.13.rc4" 
Server is saying that compatible versions are: ['3.3.11', '3.3.12', '3.3.13.rc1']
No files to retrieve.
This is normal behavior if General.transferOutputs=False is present in the task configuration.
Log file is /afs/cern.ch/work/j/jbalcas/VALIDATE/HG1502/CMSSW_7_0_6/src/HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/crab.log

https://cmsweb-testbed.cern.ch/crabcache/logfile?name=150126_101433_crab3test-5:jbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F.log&username=jbalcas

juztas commented 9 years ago

Test: Check jobs removed for memory/walltime/disk limits. The status should provide the exit code.

It works, but report of code is not correct
10 jobs failed with exit code -1:
   Invalid framework job report. The framework job report exists, but it cannot be loaded.
Have a look at https://twiki.cern.ch/twiki/bin/viewauth/CMSPublic/JobExitCodes for a description of the exit codes

I suspect this line https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/RetryJob.py#L129 should be changed to jobReport = "job_fjr.%d.%d.json" % (self.job_id, self.retry_count)
AndresTanasijczuk commented 9 years ago

I looked at this task in GlideMon, and I see that all jobs were retried because all the original jobs (the versions 0) have failed because couch was unavailable. I will talk with Marco about if we can include the latest PostJob code were jobs would not fail in such a case.

There were no completed jobs at any time, so it is ok that getoutput retrieved no files, because the output files metadata were never uploaded. I think that the problem is that crab status reported that 29 files were "transferred" and maybe that's why you expected that getoutput will retrieve these 29 files? By the way, we were not considering the job state 'transferred' as a possible job state in the getoutput and getlog functions in CRABServer. I just made a pull request for this: https://github.com/dmwm/CRABServer/pull/4644.

On Tue, Jan 27, 2015 at 8:01 AM, Justas Balčas notifications@github.com wrote:

@AndresTanasijczuk https://github.com/AndresTanasijczuk crab getoutput does not work if transferOutputs flag is true

[jbalcas@lxplus0104 src]$ crab status HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/ Warning: Incompatible CRABClient version "3.3.13.rc4" Server is saying that compatible versions are: ['3.3.11', '3.3.12', '3.3.13.rc1'] Task name: 150126_101433_crab3test-5:jbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F Task status: SUBMITTED Glidemon monitoring URL: http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=150126_101433_crab3test-5%3Ajbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=jbalcas&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150126_101433_crab3test-5%3Ajbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F Details: transferred 29.0% ( 29/100) transferring 71.0% ( 71/100)

No publication information (publication has been disabled in the crab configuration file)

[jbalcas@lxplus0104 src]$ crab getoutput HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/ Warning: Incompatible CRABClient version "3.3.13.rc4" Server is saying that compatible versions are: ['3.3.11', '3.3.12', '3.3.13.rc1'] No files to retrieve. This is normal behavior if General.transferOutputs=False is present in the task configuration. Log file is /afs/cern.ch/work/j/jbalcas/VALIDATE/HG1502/CMSSW_7_0_6/src/HG1502-2/crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F/crab.log

https://cmsweb-testbed.cern.ch/crabcache/logfile?name=150126_101433_crab3test-5:jbalcas_crab_HG1502-2-MinBias_PrivateMC_EventBased-L-T_O-T_P-F_IL-F.log&username=jbalcas

— Reply to this email directly or view it on GitHub https://github.com/dmwm/CRABServer/issues/4639#issuecomment-71598560.


Your choice makes a difference: hospitals and schools or fifa world cups