dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
45 stars 106 forks source link

Merge ACDC aquired with None acquisition era, picked up the wrong value #8102

Closed vlimant closed 6 years ago

vlimant commented 6 years ago

looking at

https://cmsweb.cern.ch/reqmgr2/fetch?rid=mcremone_ACDC0_task_SMP-RunIIWinter15wmLHE-00116__v1_T_170827_111201_6513

"AcquisitionEra": { "SMP-RunIIWinter15wmLHE-00116_0": "RunIIWinter15wmLHE", "SMP-RunIIWinter15GenOnly-00012_0": "RunIIWinter15GenOnly" },

"OutputDatasets": [ "/W2jet_WToMuNuJ_powheg_minlo_13TeV_NNPDF31nnlo/RunIIWinter15wmLHE-MCRUN2_71_V1-v2/GEN" ],

and which was acquired with AcquisitionEra = None has gone astray in reqmgr @amaltaro @ticoann

amaltaro commented 6 years ago

Jean-Roch, please link the JIRA ticket next time. It's easier to keep track and everyone in the loop without double-bugging people with alerts :) Here it is https://its.cern.ch/jira/browse/CMSCOMPPR-1224

Sooo, it's a resubmission of stepchain "OriginalRequestType": "StepChain",

and the assignment dict seems to be just fine to me [1]. It's recovering the MergeRAWSIMoutput task onlyand the workload config seems to be correct too

mcremone_ACDC0_task_SMP-RunIIWinter15wmLHE-00116__v1_T_170827_111201_6513.tasks.SMP-RunIIWinter15GenOnly-00012_0MergeRAWSIMoutput.parameters.acquisitionEra = 'RunIIWinter15GenOnly'

For some reason, the output dataset is wrong here

  "OutputDatasets": [
    "/W2jet_WToMuNuJ_powheg_minlo_13TeV_NNPDF31nnlo/RunIIWinter15wmLHE-MCRUN2_71_V1-v2/GEN"
  ], 

I'll see if I can get it fixed this week, no promises though :)

Meanwhile, can you please create another ACDC and assign it with a simple string value for AcqEra, thus "AcquisitionEra": "RunIIWinter15GenOnly"? That should work just fine in this case.

[1]

{u'AcquisitionEra': {u'SMP-RunIIWinter15GenOnly-00012_0': u'RunIIWinter15GenOnly',
                     u'SMP-RunIIWinter15wmLHE-00116_0': u'RunIIWinter15wmLHE'},
 u'AutoApproveSubscriptionSites': [],
 u'BlockCloseMaxEvents': 2000000,
 u'BlockCloseMaxWaitTime': 172800,
 u'CustodialSites': [],
 u'CustodialSubType': 'Replica',
 u'Dashboard': 'production',
 u'GracePeriod': 300,
 'HardTimeout': 159900,
 u'MaxMergeEvents': 200000,
 u'MaxMergeSize': 4294967296,
 u'MaxRSS': 2411724800,
 u'MaxVSize': 4394967000,
 u'MergedLFNBase': '/store/mc',
 u'MinMergeSize': 2147483648,
 u'NonCustodialGroup': 'DataOps',
 u'NonCustodialSites': [],
 u'NonCustodialSubType': 'Replica',
 u'ProcessingString': {u'SMP-RunIIWinter15GenOnly-00012_0': u'MCRUN2_71_V1',
                       u'SMP-RunIIWinter15wmLHE-00116_0': u'MCRUN2_71_V1'},
 u'ProcessingVersion': 2,
 u'RequestStatus': 'assigned',
 u'SiteBlacklist': [],
 u'SiteWhitelist': [u'T2_BR_SPRACE',
                    u'T2_UK_London_Brunel',
                    u'T2_UK_SGrid_Bristol',
                    u'T2_UK_SGrid_RALPP'],
 u'SoftTimeout': 159600,
 u'Team': 'production',
 u'TrustPUSitelists': False,
 u'TrustSitelists': False,
 u'UnmergedLFNBase': '/store/unmerged'}
amaltaro commented 6 years ago

So, what happens here is that we're assigning a merge ACDC workflow as if it was a StepChain one (thus dict value for AcqEra and ProcStr, just as it's supposed to be).

The problem is, we send only the production/processing task names when assigning a workflow, and of course it cannot find since this ACDC is going to run only a merge task. It then fallbacks to the parent task and retrieve its AcquisitionEra (which is the only processing task in a stepchain, with the same AcquisitionEra as Step1).

Sort of the same issue we had - several times - with TaskChains... trying to fix it still for this testbed release :(