CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Memory and maxRSS #77

Closed paorozo closed 5 years ago

paorozo commented 6 years ago

Yesterday I sent this action:

"mcremone_ACDC0_task_HIG-RunIISummer17wmLHEGS-00044__v1_T_171016_112558_6837": {"Action": "acdc", "Reasons": [], "ACDCs": [], "user": "prozober", "Parameters": {"HIG-RunIISummer17DRStdmix-00054_0": {"sites": ["T2_CH_CERN", "T2_UK_London_Brunel", "T2_UK_London_IC", "T2_UK_SGrid_RALPP", "T2_US_Wisconsin"], "memory": "12000"}}}

The ACDC created was: https://cmsweb.cern.ch/reqmgr2/fetch?rid=vlimant_ACDC1_task_HIG-RunIISummer17wmLHEGS-00044__v1_T_171023_194909_6662

As you can see:

"Task2": {
...
    "SizePerEvent": 2009.7536, 
    "ConfigCacheID": "962077ac29a24ab2a58414b453c2514c", 
    "Memory": "5000", 
    "Multicore": 8, 
    "TaskName": "HIG-RunIISummer17DRStdmix-00054_0", 
...
  }

The memory wasn't changed for Task2, but in the main dictionary, we have: "Memory": "12000".

To set the maxRss value for every task I thought https://github.com/CMSCompOps/WmAgentScripts/commit/afd3eb8ee566ddc4d1d7c28035ddb27c45d0ad8d would choose the bigger between the main dictionary and task's value.

But it's not like that: "MaxRSS": { "HIG-RunIISummer17DRStdmix-00054_0": 5120000, "HIG-RunIISummer17DRStdmix-00054_1": 5120000, "HIG-RunIISummer17wmLHEGS-00044_0": 5120000, "HIG-RunIISummer17MiniAOD-00051_0": 5120000 },

So, we either we let the recovery tool to modify both the main dictionary and the task's memory value, or we modify the reqMgrClient.py.

vlimant commented 6 years ago

the issue I think is that this is a second round ACDC and https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/actor.py#L65 https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/actor.py#L197

"initial" and "payload" are not TaskChain, but a Resubmission. @areinsvo I suggest using the presence of "TaskChain" in "initial" and "payload" to decide to edit tasks

areinsvo commented 6 years ago

The relevant lines in actor have been changed. @prozober if you create another acdc round the memory should be set correctly this time.

paorozo commented 6 years ago

Following this issue, we want to change the way we are assigning memory per task. We were overwriting the memory value, now we think it is safer to take the field "memory" from our tool as the amount we want the memory to increase. Taken from https://its.cern.ch/jira/browse/CMSCOMPPR-1597 @areinsvo, would it be straightforward?