boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.03k stars 1.87k forks source link

Estimator.environment not using in SageMaker.Client.create_hyper_parameter_tuning_job() #3488

Closed DougTrajano closed 1 year ago

DougTrajano commented 1 year ago

Describe the bug

The Training job created inside a Hyperparameter tuning job does not receive the environment configuration.

Related SageMaker SDK issues

Expected Behavior

I'm expecting to be able to set environment variables in training jobs launched under a hyperparameter tuning job.

Current Behavior

Currently, all training jobs under a hyperparameter tuning job don't receive the environment parameter defined in the Estimator API.

Reproduction Steps

For example, launching the following training job directly from Estimator API works well. The training job receives the environment variables correctly.

from sagemaker.sklearn.estimator import SKLearn
estimator = SKLearn(
    entry_point='train.py',
    source_dir='../',
    role=role,
    metric_definitions=metric_definitions,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    framework_version='0.23-1',
    base_job_name='mlflow',
    vpc_config=vpc_config,
    environment={'MLFLOW_TRACKING_URI': 'https://xyz.com/mlflow-np',
                            'OTHER_ENV': 'something'}
)
estimator.fit({'train':train_path, 'test': test_path})

If we launch the same estimator object in a hyperparameter tuning job (using HyperparameterTuner API) the training jobs do not receive the environment variables.

estimator = SKLearn(
    entry_point='train.py',
    source_dir='../',      
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    framework_version='0.23-1',
    py_version='py3',
    vpc_config=vpc_config,
    environment={'MLFLOW_TRACKING_URI': 'https://xyz.com/mlflow-np',
                            'OTHER_ENV': 'something'}
)

hyperparameter_ranges = {
    'n-estimators': IntegerParameter(50, 200),
    'min-samples-leaf': IntegerParameter(1, 10)
}

objective_metric_name = 'median-AE'
objective_type = 'Minimize'

tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=20,
                            max_parallel_jobs=10,
                            objective_type=objective_type,
                            base_tuning_job_name='mlflow')

tuner.fit({'train':train_path, 'test': test_path})

Other example: Dummy entry point script and notebook running sagemaker SDK

Possible Solution

It looks like the SageMaker.Client.create_hyper_parameter_tuning_job() doesn't have the environment configurations as you can see in SageMaker — Boto 3 Docs 1.9.185 documentation.

Additional Information/Context

No response

SDK version used

1.26.1

Environment details (OS name and version, etc.)

Windows 11

tim-finnigan commented 1 year ago

Hi @DougTrajano thanks for reaching out. Is this issue also meant for the sagemaker-python-sdk repository like the others you linked? Or if what you're describing is an issue with the CreateHyperParameterTuningJob API then we recommend escalating this to AWS Support for further assistance as the SageMaker team maintains this underlying API which is used across SDKs such as boto3.

DougTrajano commented 1 year ago

Yes! Looks like it is related to the CreateHyperParameterTuningJob API.

Unfortunately, I cannot raise a ticket to AWS Support because I have a basic AWS Account.

Could you escalate it, please?

Just to organize the logic:

  1. sagemaker.tuner.HyperparameterTuner() (aws/sagemaker-python-sdk) depends on SageMaker.Client.create_hyper_parameter_tuning_job() (boto/boto3)
  2. SageMaker.Client.create_hyper_parameter_tuning_job() (boto/boto3) depends on CreateHyperParameterTuningJob SageMaker API (CreateHyperParameterTuningJob - Amazon SageMaker)

I think that it will requires effort in all sides (packages) :)

tim-finnigan commented 1 year ago

Thanks @DougTrajano for following up. I reached out to the SageMaker team internally regarding this and created this issue in our cross-SDK repository for tracking going forward: https://github.com/aws/aws-sdk/issues/404. I can't guarantee when we'll get a response but someone may reach out on those issues in the sagemaker-python-sdk repo prior to us hearing back.