dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
7 stars 2 forks source link

Retry batch tasks on spot preemption #212

Closed dchaley closed 3 months ago

dchaley commented 3 months ago

We've had several jobs fail due to spot VM preemption. Batch has built-in tools to retry preemptions: https://cloud.google.com/batch/docs/automate-task-retries#retry-some-failures

eg:

"taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "sleep 30"
            }
          }
        ],
        "maxRetryCount": 3,
        "lifecyclePolicies": [
          {
             "action": "RETRY_TASK",
             "actionCondition": {
               "exitCodes": [50001]
            }
          }
        ]
      }
    }
  ],