Running a Sagemaker Pipeline with a tune step in which the estimator used is in script mode fails due to a lacking metrics definition in the pipeline definition, albeit it was passed to the estimator upon pipeline creation with the SDK.
This means a pipeline created and upserted with the Python SDK with a tuning step that contains a script mode estimator can't run.
I opened a support case with AWS premium support (170258349801168). The below code snippets are a simplified version.
To reproduce
I have a sagemaker pipeline with a hyper-parameter tuning step based on the SKLearn script mode estimator. The relevant snippet
I clearly pass a metrics definition into the estimator.
The pipeline can be upserted and started, but I get the following runtime error in the hyper-parameter tuning step (even before the Hyper-parameter tuning job is created in the sagemaker console):
ClientError: Failed to invoke sagemaker:CreateHyperParameterTuningJob. Error Details: A metric is required for this hyperparameter tuning job objective. Provide a metric in the metric definitions.
As expected, to create a hyper-parameter tuning job with CreateHyperParameterTuningJob the key TrainingJobDefinitions must contain a MetricDefinition. The following snippet from the docs:
However, my pipeline definition file did not contain it, although I passed metric_definitions into the estimator. I believe it must be lost on the way. In my previous version of the pipeline, I used a train step instead of a tune step, and the metrics made it into the pipeline definition file. Here the relevant excerpt:
It seems that a metric definition has to be supplied in sagemaker.tuner.HyperparameterTuner instead of the estimator. Thanks AWS support for providing the solution.
Describe the bug
Running a Sagemaker Pipeline with a tune step in which the estimator used is in script mode fails due to a lacking metrics definition in the pipeline definition, albeit it was passed to the estimator upon pipeline creation with the SDK.
This means a pipeline created and upserted with the Python SDK with a tuning step that contains a script mode estimator can't run.
I opened a support case with AWS premium support (170258349801168). The below code snippets are a simplified version.
To reproduce
I have a sagemaker pipeline with a hyper-parameter tuning step based on the
SKLearn
script mode estimator. The relevant snippetI clearly pass a metrics definition into the estimator.
The pipeline can be upserted and started, but I get the following runtime error in the hyper-parameter tuning step (even before the Hyper-parameter tuning job is created in the sagemaker console):
As expected, to create a hyper-parameter tuning job with CreateHyperParameterTuningJob the key
TrainingJobDefinitions
must contain aMetricDefinition
. The following snippet from the docs:However, my pipeline definition file did not contain it, although I passed
metric_definitions
into the estimator. I believe it must be lost on the way. In my previous version of the pipeline, I used a train step instead of a tune step, and the metrics made it into the pipeline definition file. Here the relevant excerpt:Expected behavior
The Python SDK takes into account the
metrics_definition
key supplied to theSKLearn
estimator when creating the tune step of the pipeline.System information A description of your system. Please provide: