Closed amaltaro closed 7 years ago
BTW, it seems there is no state transition verification for reqmgr2. I moved a reqmgr request to closed-out using reqmgr2 API (/reqmgr2/data/request/
I found in the workqueue logs another request assigned to two team names (relval twice). The dictionary used to create the workflow was (from reqmgr2 logs, so data sanitization/default added on top of the original request):
{'AcquisitionEra': 'FAKE',
'AllowOpportunistic': False,
'AutoApproveSubscriptionSites': [],
'BlockBlacklist': [],
'BlockCloseMaxEvents': 25000000,
'BlockCloseMaxFiles': 500,
'BlockCloseMaxSize': 5000000000000,
'BlockCloseMaxWaitTime': 66400,
'BlockWhitelist': [],
'CMSSWVersion': 'CMSSW_8_1_0_pre7',
'Campaign': 'CMSSW_8_1_0_pre7',
'ConfigCacheID': None,
'ConfigCacheURL': 'https://cmsweb.cern.ch/couchdb',
'ConfigCacheUrl': None,
'CouchDBName': 'reqmgr_config_cache',
'CouchURL': 'https://cmsweb.cern.ch/couchdb',
'CouchWorkloadDBName': 'reqmgr_workload_cache',
'CustodialGroup': 'DataOps',
'CustodialSites': [],
'CustodialSubType': 'Replica',
'DQMConfigCacheID': '85ff0a90d773227202e94ffef666c055',
'DQMHarvestUnit': 'byRun',
'DQMSequences': [],
'DQMUploadProxy': None,
'DQMUploadUrl': 'https://cmsweb.cern.ch/dqm/relval',
'Dashboard': '',
'DashboardHost': 'cms-wmagent-job.cern.ch',
'DashboardPort': 8884,
'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'DeleteFromSource': False,
'EnableHarvesting': 'True',
'EnableNewStageout': False,
'FirstEvent': 1,
'FirstLumi': 1,
'GlobalTag': '81X_dataRun2_relval_v0',
'GlobalTagConnect': None,
'GracePeriod': 300,
'Group': 'ppd',
'IgnoredOutputModules': [],
'IncludeParents': False,
'InitialPriority': 500000,
'LumiList': {},
'MaxMergeEvents': 100000,
'MaxMergeSize': 4294967296,
'MaxRSS': 2411724,
'MaxVSize': 20411724,
'MaxWaitTime': 86400,
'Memory': 3000,
'MergedLFNBase': '/store/data',
'MinMergeSize': 2147483648,
'Multicore': 1,
'NonCustodialGroup': 'DataOps',
'NonCustodialSites': [],
'NonCustodialSubType': 'Replica',
'OutputDatasets': ['/ZeroBias/CMSSW_8_1_0_pre7-80X_dataRun2_HLT_relval_v11_RelVal_zb2015D-v1/FEVTDEBUGHLT',
'/ZeroBias/CMSSW_8_1_0_pre7-TkAlMinBias-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/ALCARECO',
'/ZeroBias/CMSSW_8_1_0_pre7-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/MINIAOD',
'/ZeroBias/CMSSW_8_1_0_pre7-EcalESAlign-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/ALCARECO',
'/ZeroBias/CMSSW_8_1_0_pre7-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/RECO',
'/ZeroBias/CMSSW_8_1_0_pre7-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/DQMIO',
'/ZeroBias/CMSSW_8_1_0_pre7-SiStripCalMinBias-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/ALCARECO',
'/ZeroBias/CMSSW_8_1_0_pre7-SiStripCalZeroBias-81X_dataRun2_relval_v0_RelVal_zb2015D-v1/ALCARECO'],
'OverrideCatalog': None,
'PeriodicHarvestInterval': 0,
'PrepID': None,
'ProcessingString': '',
'ProcessingVersion': 1,
'ReqMgr2Only': True,
'RequestDate': [2016, 6, 15, 0, 52, 1],
'RequestName': 'prebello_RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D_160615_025201_1801',
'RequestPriority': 500000,
'RequestStatus': 'new',
'RequestString': 'RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D',
'RequestTransition': [{'DN': u'/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=prebello/CN=672856/CN=Patricia Rebello Teles',
'Status': 'new',
'UpdateTime': 1465951921}],
'RequestType': 'TaskChain',
'RequestWorkflow': 'https://cmsweb.cern.ch/couchdb/reqmgr_workload_cache/prebello_RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D_160615_025201_1801/spec',
'Requestor': 'prebello',
'RequestorDN': u'/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=prebello/CN=672856/CN=Patricia Rebello Teles',
'RunNumber': 0,
'ScramArch': 'slc6_amd64_gcc530',
'SiteBlacklist': [],
'SiteWhitelist': [],
'SizePerEvent': 1234,
'SoftTimeout': 129600,
'SoftwareVersions': ['CMSSW_8_1_0_pre7'],
'SubRequestType': 'RelVal',
'SubscriptionPriority': 'Low',
'Task1': {'AcquisitionEra': 'CMSSW_8_1_0_pre7',
'ConfigCacheID': '85ff0a90d773227202e94ffef66561bb',
'GlobalTag': '80X_dataRun2_HLT_relval_v11',
'InputDataset': '/ZeroBias/Run2015D-v1/RAW',
'KeepOutput': True,
'LumiList': {'256677': [[1, 291],
[293, 390],
[392, 397],
[400, 455],
[457, 482]]},
'LumisPerJob': 1,
'Memory': 7500,
'Multicore': 4,
'ProcessingString': '80X_dataRun2_HLT_relval_v11_RelVal_zb2015D',
'SplittingAlgo': 'LumiBased',
'TaskName': 'HLTDR2_25ns'},
'Task2': {'AcquisitionEra': 'CMSSW_8_1_0_pre7',
'ConfigCacheID': '85ff0a90d773227202e94ffef666b5e9',
'GlobalTag': '81X_dataRun2_relval_v0',
'InputFromOutputModule': 'FEVTDEBUGHLToutput',
'InputTask': 'HLTDR2_25ns',
'KeepOutput': True,
'LumisPerJob': 5,
'Memory': 7500,
'Multicore': 4,
'ProcessingString': '81X_dataRun2_relval_v0_RelVal_zb2015D',
'SplittingAlgo': 'LumiBased',
'TaskName': 'RECODR2_25nsreHLT'},
'TaskChain': 2,
'Team': '',
'TimePerEvent': 0.1,
'TrustPUSitelists': False,
'TrustSitelists': False,
'UnmergedLFNBase': '/store/unmerged',
'ValidStatus': 'PRODUCTION',
'VoGroup': 'unknown',
'VoRole': 'unknown',
'dashboardActivity': 'relval',
'mergedLFNBase': '/store/relval',
'unmergedLFNBase': '/store/unmerged'}
and from Andrew's logging, these are the parameters PUT during assignment:
{'AcquisitionEra': u'CMSSW_8_1_0_pre7',
'AutoApproveSubscriptionSites': [],
'BlockCloseMaxEvents': 2000000,
'BlockCloseMaxWaitTime': 28800,
'CustodialSites': [],
'CustodialSubType': 'Replica',
'Dashboard': 'relval',
'GracePeriod': 300,
'MaxMergeEvents': 50000,
'MaxMergeSize': 4294967296,
'MaxRSS': {u'HLTDR2_25ns': 7680000, u'RECODR2_25nsreHLT': 7680000},
'MaxVSize': 4394967000,
'MergedLFNBase': '/store/relval',
'MinMergeSize': 2147483648,
'NonCustodialSites': [],
'NonCustodialSubType': 'Replica',
'ProcessingString': {u'HLTDR2_25ns': u'80X_dataRun2_HLT_relval_v11_RelVal_zb2015D',
u'RECODR2_25nsreHLT': u'81X_dataRun2_relval_v0_RelVal_zb2015D'},
'ProcessingVersion': 1,
'RequestName': 'prebello_RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D_160615_025201_1801',
'RequestStatus': 'assigned',
'SiteBlacklist': [],
'SiteWhitelist': 'T1_US_FNAL',
'SoftTimeout': 129600,
'Team': 'relval',
'Teamrelval': 'checked',
'TrustSitelists': True,
'UnmergedLFNBase': '/store/unmerged',
'action': 'Assign',
'checkboxprebello_RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D_160615_025201_1801': 'checked',
'maxVSize': 4394967000}
These 3 parameters are used for reqmgr web assignment:
'Teamrelval': 'checked',
'action': 'Assign',
'checkboxprebello_RVCMSSW_8_1_0_pre7RunZeroBias2015D__RelVal_zb2015D_160615_025201_1801': 'checked',
and this one got deprecated a year'ish ago (capital M is correct)
'maxVSize': 4394967000
@AndrewLevin FYI
According to this constraint: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WorkQueue/WorkQueueReqMgrInterface.py#L238
requests will fail to get acquired by GQ. Instead, we should fail the assignment of such requests. This is one example that was bugging GQ: https://cmsweb.cern.ch/reqmgr2/data/request?name=mewu_RVCMSSW_8_0_10PhotonJets_Pt_10_13_160602_114021_8987