CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Include the recovery procedure #39

Closed paorozo closed 6 years ago

paorozo commented 7 years ago

Just a reminder, we need to include the "Recovery" option, besides ACDC and Kill and Clone.

Currently, we create recoveries as follows: https://twiki.cern.ch/twiki/bin/viewauth/CMS/CompOpsPRWorkflowTrafficController#Recovering_Workflows

dabercro commented 7 years ago

Okay thanks, like I said, I hadn't realized that was separate from ACDC. Can you give me a list of parameters that you would like for Recovery? Is this also split by tasks?

paorozo commented 7 years ago

The recovery procedure involves three phases: In phase 1 we create a json files with the specifications of the new requests we are going to create to recover the missing lumis for one workflow. We will get one json file per datatier. There is an example of the json file:

{"createRequest": {"InitialTaskPath": "/fabozzi_HIRun2015-HIMinimumBias1-02May2016_758p4_170306_123923_2547/DataProcessing", "CollectionName": "fabozzi_HIRun2015-HIMinimumBias1-02May2016_758p4_170306_123923_2547_a752b4de-28d9-11e7-b427-02163e00f196", "PrepID": "ReReco-HIRun2015-02May2016-0008", "Group": "DATAOPS", "RequestPriority": 900000.0, "ACDCDatabase": "acdcserver", "Memory": 9000, "Requestor": "prozober", "SizePerEvent": 300, "RequestString": "recovery-0-fabozzi_HIRun2015-HIMinimumBias1-02May2016_758p4_", "IgnoredOutputModules": [], "ACDCServer": "https://cmsweb.cern.ch/couchdb", "OriginalRequestName": "fabozzi_HIRun2015-HIMinimumBias1-02May2016_758p4_170306_123923_2547", "Campaign": "HIRun2015", "RequestType": "Resubmission", "TimePerEvent": 6}, "changeSplitting": {"DataProcessing": {"SplittingAlgo": "LumiBased", "halt_job_on_file_boundaries": "True", "lumis_per_job": 1}}, "assignRequest": {"SiteWhitelist": ["T1_DE_KIT", "T2_CH_CERN_HLT", "T1_FR_CCIN2P3", "T1_ES_PIC", "T2_US_MIT", "T2_IT_Legnaro", "T2_UK_London_Brunel", "T2_BE_IIHE", "T0_CH_CERN", "T2_IT_Pisa", "T2_CH_CERN"], "ProcessingVersion": 1, "MaxRSS": 2411724, "ProcessingString": "02May2016", "Dashboard": "reprocessing", "Team": "production", "UnmergedLFNBase": "/store/unmerged", "MergedLFNBase": "/store/hidata", "MaxVSize": 20411724, "OpenRunningTimeout": 0, "AcquisitionEra": "HIRun2015"}}

In phase 2 we inject the requests into requestManager2. We use reqMgrClient to do that: python reqMgrClient.py -j <file.json> In phase 3, we assign the requests.

So, the parameters we need to take into account are: