dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
45 stars 106 forks source link

Error 400 while injecting an extension worfklow - FirstLumi #5148

Closed julianbadillo closed 10 years ago

julianbadillo commented 10 years ago

When I try to create an extension workflow in ReqMgr through the rest API with the following schema[1] it return an "Error 400" message with no further explanation. However if I try to create it with schema [2] it suceeds. The only difference is the value of "FirstLumi", which is translated to avoid duplicated lumis while extending.

[1] {'AcquisitionEra': 'ppSpring2014',
'CMSSWVersion': 'CMSSW_5_3_16',
'Campaign': 'ppSpring2014',
'ConfigCacheID': 'd9868229df876c105faf194caf50a7bc',
'CouchDBName': 'reqmgr_config_cache',
'CouchURL': 'https://cmsweb.cern.ch/couchdb',
'CouchWorkloadDBName': 'reqmgr_workload_cache',
'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'EventsPerJob': 896861,
'EventsPerLumi': 3333333,
'FilterEfficiency': 3.0000000000000001e-05,
'FirstEvent': 10001001,
'FirstLumi': 3345200,
'GlobalTag': 'STARTHI53_V28::All',
'Group': 'DATAOPS',
'LheInputFiles': 'False',
'Memory': 2300,
'OpenRunningTimeout': 43200,
'PrepID': 'FSQ-ppSpring2014-00003',
'PrimaryDataset': 'QCD_Pt-15to1000_fwdJet_bwdJet_Tune4C_Flat_2p76TeV-pythia8',
'ProcessingString': 'STARTHI53_V28_castor',
'ProcessingVersion': 1,
'RequestDate': [2014, 4, 4, 13, 3, 35], 'RequestName': 'pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560', 4, 4, 13, 3, 35], 'RequestNumEvents': 6135948,
'RequestPriority': 70000,
'RequestString': 'EXT_FSQ-ppSpring2014-00003_00003_v0_castor',
'RequestType': 'MonteCarlo',
'Requestor': 'jbadillo',
'RequestorDN': '/DC=ch/DC=cern/OU=computers/CN=pdmvserv/pdmvserv-test.cern.ch',
'ScramArch': 'slc5_amd64_gcc462',
'SizePerEvent': 420,
'TimePerEvent': 0.0321119791667,
'inputMode': 'couchDB',
'timeStamp': 1396616616}

[2] {'AcquisitionEra': 'ppSpring2014',
'CMSSWVersion': 'CMSSW_5_3_16',
'Campaign': 'ppSpring2014',
'ConfigCacheID': 'd9868229df876c105faf194caf50a7bc',
'CouchDBName': 'reqmgr_config_cache',
'CouchURL': 'https://cmsweb.cern.ch/couchdb',
'CouchWorkloadDBName': 'reqmgr_workload_cache',
'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'EventsPerJob': 896861,
'EventsPerLumi': 3333333,
'FilterEfficiency': 3.0000000000000001e-05,
'FirstEvent': 10001001,
'FirstLumi': 1,
'GlobalTag': 'STARTHI53_V28::All',
'Group': 'DATAOPS',
'LheInputFiles': 'False',
'Memory': 2300,
'MergedLFNBase': '/store/mc',
'OpenRunningTimeout': 43200,
'PrepID': 'FSQ-ppSpring2014-00003',
'PrimaryDataset': 'QCD_Pt-15to1000_fwdJet_bwdJet_Tune4C_Flat_2p76TeV-pythia8',
'ProcessingString': 'STARTHI53_V28_castor',
'ProcessingVersion': 2,
'RequestDate': [2014, 4, 4, 13, 3, 35], 'RequestName': 'pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560',
'RequestNumEvents': 10000000,
'RequestPriority': 70000,
'RequestString': 'FSQ-ppSpring2014-00003_00003_v0_castor',
'RequestType': 'MonteCarlo',
'Requestor': 'jbadillo',
'RequestorDN': '/DC=ch/DC=cern/OU=computers/CN=pdmvserv/pdmvserv-test.cern.ch',
'ScramArch': 'slc5_amd64_gcc462',
'SizePerEvent': 420,
'TimePerEvent': 0.0321119791667,
'inputMode': 'couchDB',
'timeStamp': 1396616616}

julianbadillo commented 10 years ago

@ticoann @amaltaro @lucacopa Any ideas on this?

amaltaro commented 10 years ago

@julianbadillo , can you point me to the script you guys are using to extend workflows? Can you also provide the exact command you used to extend it? Last question, the workflow you are trying to extend is pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560 right? Since we'll have a new cmsweb-dev deployment, I can try setting ReqMgr to debug and see whether we get anything useful. Otherwise, it would be better to deploy ReqMgr in a VM and play with that.

julianbadillo commented 10 years ago

Right now is this one https://github.com/julianbadillo/WmAgentScripts/blob/master/resubmit.py

julianbadillo commented 10 years ago

I'm trying to extend any of these: pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560 pdmvserv_FSQ-ppSpring2014-00005_00003_v0_castor_140404_150333_6665 pdmvserv_BPH-Summer12-00166_00114_v0__140410_161619_101

juztas commented 10 years ago

In reqmgr logs I see this :

ERROR:cherrypy.error:Create request failed, reason: integer division or modulo by zero
INFO:cherrypy.access:[22/May/2014:11:03:25] vocms161.cern.ch 128.142.138.246 "POST /reqmgr/create/makeSchema HTTP/1.1" 400 [data: - in 725 out 701315 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=jbadillo/CN=753111/CN=Julian Badillo Rojas" "" ] [ref: "" "" ]

Looked in the code[https://github.com/dmwm/WMCore/blob/d7e5410d4c5f92ddefee62d74f4a1c0bb956e85d/src/python/WMCore/WMSpec/StdSpecs/MonteCarlo.py#L116], and for me it looks like the calculation here are not correct and give that error : integer division or modulo by zero.

ticoann commented 10 years ago

Thanks Justas for looking into the problem. It seemed the assumption was events per job is smaller than events per lumi. (Not sure how that assumption is made) Basically every job will contain different lumis. But parameters in the example seems to be containing 0.26 Lumis per job. Is this OK? One other thing I am not sure is why it calculate the previous job number by firstLumi not firstEvent value. Since MC workflow only uses EventBased splitting, it makes more sense to get the job number from firstEvent value not firstLumi value. Julian, could you comment on that.

julianbadillo commented 10 years ago

I'm commenting in the pull request.