aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.03k stars 6.75k forks source link

Create tuning job not working with deepar #723

Open doron31 opened 5 years ago

doron31 commented 5 years ago

I am trying to lunch a grid search using HyperparameterTuner but it seems that the job does not start. This is the relevant part of my code:

`from sagemaker.amazon.amazon_estimator import get_image_uri image_name = get_image_uri(boto3.Session().region_name, 'forecasting-deepar')

estimator = sagemaker.estimator.Estimator( sagemaker_session=sagemaker_session, image_name=image_name, role=role, train_instance_count=1, train_instance_type='ml.c4.xlarge', base_job_name='DEMO-deepar', output_path="s3://" + s3_outputpath )

param_grid = { 'context_length_param' : ["60"], 'num_cells_param' : ["50"], 'epocs_param' : ["50"], 'learning_rate_param' : ["0.01"], 'embedding_dimension_param' : ["5"]}

hyperparameter_ranges = { 'num_layers': IntegerParameter(2, 3, scaling_type="Auto") }

objective_metric_name = 'test:RMSE' tuner_random = HyperparameterTuner( estimator, objective_metric_name, hyperparameter_ranges, max_jobs=2, max_parallel_jobs=2, strategy='Random', objective_type = 'Minimize' )

tuner_random.fit(inputs=data_channels)
tuner_random `

and the output i am getting is <sagemaker.tuner.HyperparameterTuner at 0x7f6da4bca6a0>

does not seem that any training is taking place

Any idea what am I missing?

yangaws commented 5 years ago

@doron31

Because last line of your code is just tuner_random, which is a SageMaker object. That's what I guess the output means.

You can check SageMaker tuning job history or training job history on AWS SageMaker console. Have you checked and made sure that no tuning job or training job created by your codes?

And another way to check is to describe the tuning job you just created. You can try

boto3.client('sagemaker').describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner_random.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']

This code will return the status of the tuning job associated with the tuner.

doron31 commented 5 years ago

Thanks! this is really helpfull I have created a tuning job that has failed. the reason is: Failure reason ClientError: Invalid hyperparameter eval_metric

which I am not sure why because I am passing "test:RMSE: as the objective_metric_name code snippet below

`objective_metric_name = 'test:RMSE' tuner_random = HyperparameterTuner( estimator, objective_metric_name, hyperparameter_ranges, max_jobs=2, max_parallel_jobs=2, strategy='Random', objective_type = 'Minimize' )

tuner_random.fit(inputs=data_channels) `

yangaws commented 5 years ago

@doron31

Could you provide the estimator creation code here for deepar? And could you try something like

print(your_deepar_estimator.hyperparameters())

and provide the output here?