dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
45 stars 106 forks source link

'KeepOutput': False affects the job splitting #5007

Closed amaltaro closed 9 years ago

amaltaro commented 10 years ago

Hi guys,

it looks like when we have 2 'KeepOutput': False in cascaded, e.g., for Task1 and Task2, as in this testbed request: amaltaro_RVCMSSW_7_0_0_pre11ProdTTbar_140228_160041_5739

the job splitting field/algorithm is empty for Task2, even when we provide this job splitting at injection time, as one can see in the spec file [1]

Looking into details in WMStats, I noticed Task1 - ProdTTbar ran 45 jobs Task2 - DIGIPROD1 ran 1 job (so it took all the input from Task1 in one shot) Task3 - RECOPROD1 ran 5 jobs (that's correct, each of them with 10lumis)

I noticed this behavior is there since like 2 or 3 months ago, but I'm reporting it only now, sorry :-)

Thanks, Alan.

[1] request.schema.Task2 = {'KeepOutput': False, 'GlobalTag': 'START70_V4::All', 'InputFromOutputModule': 'RAWSIMoutput', 'ProcessingString': 'START70_V4', 'SplittingAlgo': 'LumiBased', 'InputTask': 'ProdTTbar', 'ConfigCacheID': 'b6db80a24b9858e6cb9ee6c49891c6d6', 'LumisPerJob': 10, 'TaskName': 'DIGIPROD1', 'AcquisitionEra': 'CMSSW_7_0_0_pre11'}

amaltaro commented 10 years ago

I usually get these specs with the resubmit script. Here is the spec for the workflow above:

{'Group': 'DATAOPS', 'Requestor': 'amaltaro', 'ScramArch': 'slc5_amd64_gcc481', 'SizePerEvent': 1234, 'Memory': 2400, 'Task1': {'KeepOutput': False, 'GlobalTag': 'START70_V4::All', 'SplittingAlgo': 'EventBased', 'ProcessingString': 'START70_V4', 'Seeding': 'AutomaticSeeding', 'ConfigCacheID': 'b6db80a24b9858e6cb9ee6c49891d512', 'TaskName': 'ProdTTbar', 'AcquisitionEra': 'CMSSW_7_0_0_pre11', 'PrimaryDataset': 'RelValProdTTbar', 'EventsPerJob': 100, 'RequestNumEvents': 9000}, 'Task2': {'KeepOutput': False, 'GlobalTag': 'START70_V4::All', 'InputFromOutputModule': 'RAWSIMoutput', 'ProcessingString': 'START70_V4', 'SplittingAlgo': 'LumiBased', 'InputTask': 'ProdTTbar', 'ConfigCacheID': 'b6db80a24b9858e6cb9ee6c49891c6d6', 'LumisPerJob': 10, 'TaskName': 'DIGIPROD1', 'AcquisitionEra': 'CMSSW_7_0_0_pre11'}, 'Task3': {'KeepOutput': True, 'GlobalTag': 'START70_V4::All', 'InputFromOutputModule': 'RAWSIMoutput', 'ProcessingString': 'START70_V4', 'SplittingAlgo': 'LumiBased', 'InputTask': 'DIGIPROD1', 'ConfigCacheID': 'b6db80a24b9858e6cb9ee6c49891d338', 'LumisPerJob': 10, 'TaskName': 'RECOPROD1', 'AcquisitionEra': 'CMSSW_7_0_0_pre11'}, 'RequestType': 'TaskChain', 'timeStamp': 1393599645, 'TimePerEvent': 20, 'dashboardActivity': 'integration', 'ConfigCacheURL': 'https://cmsweb-testbed.cern.ch/couchdb', 'CouchDBName': 'reqmgr_config_cache', 'CMSSWVersion': 'CMSSW_7_0_0_pre11', 'unmergedLFNBase': '/store/unmerged', 'RequestorDN': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta Rodrigues', 'RequestPriority': 2000, 'mergedLFNBase': '/store/relval', 'ProcessingVersion': 4, 'RequestName': 'amaltaro_RVCMSSW_7_0_0_pre11ProdTTbar_140228_160041_5739', 'RequestString': 'RVCMSSW_7_0_0_pre11ProdTTbar', 'CouchURL': 'https://cmsweb-testbed.cern.ch/couchdb', 'CouchWorkloadDBName': 'reqmgr_workload_cache', 'Campaign': 'HG1403_Validation', 'GlobalTag': 'START70_V4::All', 'DbsUrl': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader', 'RequestDate': [2014, 2, 28, 15, 0, 41], 'TaskChain': 3}
amaltaro commented 9 years ago

Reproduced the same problem with this workflow https://cmsweb-testbed.cern.ch/reqmgr/view/splitting/amaltaro_700_pre11_ProdMinBias_Transient_140930_113333_6343 @ticoann , please assign this one to me :-)

ticoann commented 9 years ago

Hi Alan, I looked at the examples you gave me. https://cmsweb-testbed.cern.ch/reqmgr/view/showWorkload?requestName=amaltaro_700_pre11_ProdMinBias_Transient_140930_113333_6343

Although the splitting algo for Task2 is Lumibased it changed to WMBSMergeBySize and the argument as well due to following line. https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/TaskChain.py#L361

I think code has to be changed to https://github.com/ticoann/WMCore/commit/6eab00dbc0749f47f5cc0c5111e85d339aa9134e since we need only merge algorithm change. but I am not completely sure.

amaltaro commented 9 years ago

The error should be around this code, indeed. But I did no understand how it would switch to WMBs algo if it was set to LumiBased. Anyways, don't waste more time on this issue Seangchan, I'll tackle it on Monday :)

ticoann commented 9 years ago

I thought you are traveling. :smile:, It switch to WMBSMerge algo since its condition met. Its parent task algo is EventBased and has input data.

amaltaro commented 9 years ago

Not yet, waiting for my ride :-P Hmmm, then I think the correct check would be the algo for the current task (taskConf) instead of the parent. Unless there is another reason for the parent check that I cannot see (well, Friday evening here, so I may be completely wrong :P).

ticoann commented 9 years ago

I don't know what is the reason behind it either. Anyway, I guess you are officially workaholic. :smile: