Closed vlimant closed 7 years ago
How did you assign it? Would you have the dictionary used for assignment?
There are not assigned at all. as far as I know the assignRequest is not used at all in that case (either assign on web, or script)
right now, we assign ReReco via the web interface so we can adjust the memory. this needs to be done for Multicore workflows so it is a "knob" we need in both the scripts and the web interface if we want to remain flexible.
When you make ACDCs the memory isn't copied from the parent workflow either, you always need to adjust that parameter manually when assigning.
@jenimal Memory is different than MaxRSS. and we have to do this right all the way.
Any lead on solving this ?
Testbed deadline is this evening. But I will try to get it done tomorrow/Wednesday, just need to finish other unfinished bug fixes first
Are there any other creation parameters that you'd need to override?
TimePerEvent maybe, so that we get a better RequestTime classad, I cannot think of anything else right now. Maybe you have suggestions
Giving it a second thought, since it's a resubmission, it makes sense to make it a clone of the original request (or closest to it).
Hence, my suggestion on this case would be to override Memory
during assignment. We had a strong push in the past to make it available during assignment, so can you try it please?
I am going to try and describe yet another issue I found with the recovery procedure. https://cmsweb.cern.ch/reqmgr/view/details/vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_121213_7651
was created with the dict
{"createRequest": {"InitialTaskPath": "/jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872/DataProcessing", "OriginalRequestName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872", "CollectionName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872_74d6cf4e-6f62-11e6-92d6-02163e00f196", "PrepID": "ReReco-HIRun2015-02May2016-0004", "Campaign": "HIRun2015", "Requestor": "vlimant", "RequestPriority": 900000.0, "ACDCDatabase": "acdcserver", "Memory" : 2300, "TimePerEvent": 6.0, "RequestType": "Resubmission", "ACDCServer": "https://cmsweb.cern.ch/couchdb", "SizePerEvent": 300, "Group": "DATAOPS", "IgnoredOutputModules": [], "RequestString": "recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016758p4"}, "changeSplitting": {"DataProcessing": {"SplittingAlgo": "LumiBased", "halt_job_on_file_boundaries": "True", "lumis_per_job": 1}}, "assignRequest": {"MaxRSS": 2411724, "Team": "production", "UnmergedLFNBase": "/store/unmerged", "Dashboard": "reprocessing", "MaxVSize": 20411724, "SiteWhitelist": ["T2_US_Vanderbilt"], "MergedLFNBase": "/store/hidata", "AcquisitionEra": "HIRun2015", "ProcessingString": "02May2016", "ProcessingVersion": 2}}
vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_121213_7651.request.schema.Memory = 2300
but
vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_121213_7651.tasks.DataProcessing.input.splitting.performance.memoryRequirement = 9000.0
is picked up from "somewhere"
Creating https://cmsweb.cern.ch/reqmgr/view/details/vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_123305_8955 with (removing Memory)
{"createRequest": {"InitialTaskPath": "/jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872/DataProcessing", "OriginalRequestName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872", "CollectionName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872_74d6cf4e-6f62-11e6-92d6-02163e00f196", "PrepID": "ReReco-HIRun2015-02May2016-0004", "Campaign": "HIRun2015", "Requestor": "vlimant", "RequestPriority": 900000.0, "ACDCDatabase": "acdcserver", "TimePerEvent": 6.0, "RequestType": "Resubmission", "ACDCServer": "https://cmsweb.cern.ch/couchdb", "SizePerEvent": 300, "Group": "DATAOPS", "IgnoredOutputModules": [], "RequestString": "recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016758p4"}, "changeSplitting": {"DataProcessing": {"SplittingAlgo": "LumiBased", "halt_job_on_file_boundaries": "True", "lumis_per_job": 1}}, "assignRequest": {"MaxRSS": 2411724, "Team": "production", "UnmergedLFNBase": "/store/unmerged", "Dashboard": "reprocessing", "MaxVSize": 20411724, "SiteWhitelist": ["T2_US_Vanderbilt"], "MergedLFNBase": "/store/hidata", "AcquisitionEra": "HIRun2015", "ProcessingString": "02May2016", "ProcessingVersion": 2}}
vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_123305_8955.request.schema.Memory = 9000
vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_123305_8955.tasks.DataProcessing.input.splitting.performance.memoryRequirement = 9000.0
Now creating https://cmsweb.cern.ch/reqmgr/view/details/vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4__160831_124752_3839 with (Memory = 12G) {"createRequest": {"InitialTaskPath": "/jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872/DataProcessing", "OriginalRequestName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872", "CollectionName": "jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4_160816_142741_8872_74d6cf4e-6f62-11e6-92d6-02163e00f196", "PrepID": "ReReco-HIRun2015-02May2016-0004", "Campaign": "HIRun2015", "Requestor": "vlimant", "RequestPriority": 900000.0, "ACDCDatabase": "acdcserver", "Memory" : 12000, "TimePerEvent": 6.0, "RequestType": "Resubmission", "ACDCServer": "https://cmsweb.cern.ch/couchdb", "SizePerEvent": 300, "Group": "DATAOPS", "IgnoredOutputModules": [], "RequestString": "recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016758p4"}, "changeSplitting": {"DataProcessing": {"SplittingAlgo": "LumiBased", "halt_job_on_file_boundaries": "True", "lumis_per_job": 1}}, "assignRequest": {"MaxRSS": 2411724, "Team": "production", "UnmergedLFNBase": "/store/unmerged", "Dashboard": "reprocessing", "MaxVSize": 20411724, "SiteWhitelist": ["T2_US_Vanderbilt"], "MergedLFNBase": "/store/hidata", "AcquisitionEra": "HIRun2015", "ProcessingString": "02May2016", "ProcessingVersion": 2}}
vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4160831_124752_3839.request.schema.Memory = 12000 vlimant_recovery-0-jen_a_HIRun2015-HIHardProbesPeripheral-02May2016_758p4160831_124752_3839.tasks.DataProcessing.input.splitting.performance.memoryRequirement = 9000.0
which means that we cannot properly set the memory requirement in this recovery procedure. MaxRSS is one handle, but for job matching, it's Memory that matters.
I want to double check what is the behavior of a regular ACDC with adjusted memory parameter, but I think it will be the same