Since cdp-backend v4 we now use CML runner to create a compute instance on Google Cloud to run event_gather task. The CML runner requires GPU resources and can fail if no resources are available.
Insead of one-shot of the run and fail (CML does have some retry/timeout internally, but FWICT this is not configurable beyond idle-timeout.) use nick-fields/retry@v2 to implement retry wth constant backoff when creating the CML runner.
Set default values in cookiecutter template (total of 3.3h):
Description of Changes
idle-timeout
.) use nick-fields/retry@v2 to implement retry wth constant backoff when creating the CML runner.event_gather_runner_timeout_minutes
: 15event_gather_runner_max_attempts
: 8event_gather_runner_retry_wait_seconds
: 600