I use the Python SDK to develop ML pipelines for Azure ML.
How do I get my PythonScriptStep tasks or the encompassing Pipeline object to simply rerun upon failure?
I reckon it's pretty common for pipelines to temporarily break upon temporary network, storage, etc. issues so a simple rerun / retry seems pretty basic for task orchestration frameworks to provide (see e.g. Apache Airflow).
I've spent a fair amount of time going over the documentation for Azure ML and I just can't figure out how to get "retry upon failure" behaviour.
The closest there is is the continue_on_step_failure pipeline / task parameter which doesn't really do what's needed.
I use the Python SDK to develop ML pipelines for Azure ML.
How do I get my
PythonScriptStep
tasks or the encompassingPipeline
object to simply rerun upon failure? I reckon it's pretty common for pipelines to temporarily break upon temporary network, storage, etc. issues so a simple rerun / retry seems pretty basic for task orchestration frameworks to provide (see e.g. Apache Airflow).I've spent a fair amount of time going over the documentation for Azure ML and I just can't figure out how to get "retry upon failure" behaviour.
The closest there is is the
continue_on_step_failure
pipeline / task parameter which doesn't really do what's needed.Any advice please?