Closed julianbadillo closed 10 years ago
@ticoann @amaltaro @lucacopa Any ideas on this?
@julianbadillo , can you point me to the script you guys are using to extend workflows? Can you also provide the exact command you used to extend it? Last question, the workflow you are trying to extend is pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560 right? Since we'll have a new cmsweb-dev deployment, I can try setting ReqMgr to debug and see whether we get anything useful. Otherwise, it would be better to deploy ReqMgr in a VM and play with that.
Right now is this one https://github.com/julianbadillo/WmAgentScripts/blob/master/resubmit.py
I'm trying to extend any of these: pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560 pdmvserv_FSQ-ppSpring2014-00005_00003_v0_castor_140404_150333_6665 pdmvserv_BPH-Summer12-00166_00114_v0__140410_161619_101
In reqmgr logs I see this :
ERROR:cherrypy.error:Create request failed, reason: integer division or modulo by zero
INFO:cherrypy.access:[22/May/2014:11:03:25] vocms161.cern.ch 128.142.138.246 "POST /reqmgr/create/makeSchema HTTP/1.1" 400 [data: - in 725 out 701315 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=jbadillo/CN=753111/CN=Julian Badillo Rojas" "" ] [ref: "" "" ]
Looked in the code[https://github.com/dmwm/WMCore/blob/d7e5410d4c5f92ddefee62d74f4a1c0bb956e85d/src/python/WMCore/WMSpec/StdSpecs/MonteCarlo.py#L116], and for me it looks like the calculation here are not correct and give that error : integer division or modulo by zero.
Thanks Justas for looking into the problem. It seemed the assumption was events per job is smaller than events per lumi. (Not sure how that assumption is made) Basically every job will contain different lumis. But parameters in the example seems to be containing 0.26 Lumis per job. Is this OK? One other thing I am not sure is why it calculate the previous job number by firstLumi not firstEvent value. Since MC workflow only uses EventBased splitting, it makes more sense to get the job number from firstEvent value not firstLumi value. Julian, could you comment on that.
I'm commenting in the pull request.
When I try to create an extension workflow in ReqMgr through the rest API with the following schema[1] it return an "Error 400" message with no further explanation. However if I try to create it with schema [2] it suceeds. The only difference is the value of "FirstLumi", which is translated to avoid duplicated lumis while extending.
[1] {'AcquisitionEra': 'ppSpring2014',
'CMSSWVersion': 'CMSSW_5_3_16',
'Campaign': 'ppSpring2014',
'ConfigCacheID': 'd9868229df876c105faf194caf50a7bc',
'CouchDBName': 'reqmgr_config_cache',
'CouchURL': 'https://cmsweb.cern.ch/couchdb',
'CouchWorkloadDBName': 'reqmgr_workload_cache',
'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'EventsPerJob': 896861,
'EventsPerLumi': 3333333,
'FilterEfficiency': 3.0000000000000001e-05,
'FirstEvent': 10001001,
'FirstLumi': 3345200,
'GlobalTag': 'STARTHI53_V28::All',
'Group': 'DATAOPS',
'LheInputFiles': 'False',
'Memory': 2300,
'OpenRunningTimeout': 43200,
'PrepID': 'FSQ-ppSpring2014-00003',
'PrimaryDataset': 'QCD_Pt-15to1000_fwdJet_bwdJet_Tune4C_Flat_2p76TeV-pythia8',
'ProcessingString': 'STARTHI53_V28_castor',
'ProcessingVersion': 1,
'RequestDate': [2014, 4, 4, 13, 3, 35], 'RequestName': 'pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560', 4, 4, 13, 3, 35], 'RequestNumEvents': 6135948,
'RequestPriority': 70000,
'RequestString': 'EXT_FSQ-ppSpring2014-00003_00003_v0_castor',
'RequestType': 'MonteCarlo',
'Requestor': 'jbadillo',
'RequestorDN': '/DC=ch/DC=cern/OU=computers/CN=pdmvserv/pdmvserv-test.cern.ch',
'ScramArch': 'slc5_amd64_gcc462',
'SizePerEvent': 420,
'TimePerEvent': 0.0321119791667,
'inputMode': 'couchDB',
'timeStamp': 1396616616}
[2] {'AcquisitionEra': 'ppSpring2014',
'CMSSWVersion': 'CMSSW_5_3_16',
'Campaign': 'ppSpring2014',
'ConfigCacheID': 'd9868229df876c105faf194caf50a7bc',
'CouchDBName': 'reqmgr_config_cache',
'CouchURL': 'https://cmsweb.cern.ch/couchdb',
'CouchWorkloadDBName': 'reqmgr_workload_cache',
'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'EventsPerJob': 896861,
'EventsPerLumi': 3333333,
'FilterEfficiency': 3.0000000000000001e-05,
'FirstEvent': 10001001,
'FirstLumi': 1,
'GlobalTag': 'STARTHI53_V28::All',
'Group': 'DATAOPS',
'LheInputFiles': 'False',
'Memory': 2300,
'MergedLFNBase': '/store/mc',
'OpenRunningTimeout': 43200,
'PrepID': 'FSQ-ppSpring2014-00003',
'PrimaryDataset': 'QCD_Pt-15to1000_fwdJet_bwdJet_Tune4C_Flat_2p76TeV-pythia8',
'ProcessingString': 'STARTHI53_V28_castor',
'ProcessingVersion': 2,
'RequestDate': [2014, 4, 4, 13, 3, 35], 'RequestName': 'pdmvserv_FSQ-ppSpring2014-00003_00003_v0_castor_140404_150335_2560',
'RequestNumEvents': 10000000,
'RequestPriority': 70000,
'RequestString': 'FSQ-ppSpring2014-00003_00003_v0_castor',
'RequestType': 'MonteCarlo',
'Requestor': 'jbadillo',
'RequestorDN': '/DC=ch/DC=cern/OU=computers/CN=pdmvserv/pdmvserv-test.cern.ch',
'ScramArch': 'slc5_amd64_gcc462',
'SizePerEvent': 420,
'TimePerEvent': 0.0321119791667,
'inputMode': 'couchDB',
'timeStamp': 1396616616}