aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

estimator environment values are not used when calling HyperparameterTuner #2311

Closed anupash147 closed 11 months ago

anupash147 commented 3 years ago

Describe the bug on singleton training; i use the below code

from sagemaker.sklearn.estimator import SKLearn
estimator = SKLearn(
    entry_point='train.py',
    source_dir='../',
    role=role,
    metric_definitions=metric_definitions,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    framework_version='0.23-1',
    base_job_name='mlflow',
    vpc_config=vpc_config,
    environment={'MLFLOW_TRACKING_URI': 'https://xyz.com/mlflow-np',
                            'OTHER_ENV': 'something'}
)
estimator.fit({'train':train_path, 'test': test_path})

it works well but on using the same code for hyperparameter training it fails. I don't see the reason on failing.

estimator = SKLearn(
    entry_point='train.py',
    source_dir='../',      
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    framework_version='0.23-1',
    py_version='py3',
    vpc_config=vpc_config,
    environment={'MLFLOW_TRACKING_URI': 'https://xyz.com/mlflow-np',
                            'OTHER_ENV': 'something'}
)

hyperparameter_ranges = {
    'n-estimators': IntegerParameter(50, 200),
    'min-samples-leaf': IntegerParameter(1, 10)
}

objective_metric_name = 'median-AE'
objective_type = 'Minimize'

tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=20,
                            max_parallel_jobs=10,
                            objective_type=objective_type,
                            base_tuning_job_name='mlflow')

tuner.fit({'train':train_path, 'test': test_path})

Also i am not able see the environment variables in debug.

Expected behavior If I hardcode the values in the training it works fine.. this code is taken from https://github.com/aws-samples/amazon-sagemaker-mlflow-fargate/blob/main/lab/2_track_experiments_hpo.ipynb

System information A description of your system. Please provide:

ahsan-z-khan commented 3 years ago

Hi @anupash147 ,

Thank you for using Amazon SageMaker.

Can you provide details on the bug? Which error you are getting?

DougTrajano commented 1 year ago

Hi folks, a workaround solution for that is to use the hyperparameters instead of environment in the Estimator API.

To don't change my estimator implementation, I added the code below in the hyperparameters code.

for k, v in estimator.environment.items():
    estimator._hyperparameters[k] = v

In the entry_point script, I added all unexpected arguments as environment variables

if len(remaining_args) > 0:
    for arg in range(len(remaining_args)):
        if remaining_args[arg].startswith("--"):
            os.environ[remaining_args[arg].strip("--")] = remaining_args[arg+1]
lorenzwalthert commented 1 year ago

I think we can close this as per https://github.com/aws/sagemaker-python-sdk/pull/3614.

martinRenou commented 11 months ago

Closing as resolved. Feel free to reopen/continue the discussion if needed.