aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

XGBoost container prevents custom tunable parameters #2226

Open seanpmorgan opened 3 years ago

seanpmorgan commented 3 years ago

Describe the bug When using the XGBoost estimator is script mode, user's are unable to provide custom tunable parameters in their script. It appears there is a check in the sdk (and boto3 below it) that assumes that the XGBoost hyper-parameters must match those from the built-in algo.

To reproduce

import sagemaker
from sagemaker.xgboost import XGBoost
from sagemaker.tuner import ContinuousParameter, IntegerParameter, CategoricalParameter, HyperparameterTuner

sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()

static_hyperparameters = {'num_round': 50}
estimator = XGBoost(
    entry_point='train.py',
    source_dir='xgb_src',
    role=role,
    framework_version='1.2-1',
    model_dir='/opt/ml/model',
    output_path="s3://{}/{}/output".format(bucket, 'xgb-hpo-demo'),
    instance_type='ml.m5.xlarge',
    instance_count=1,
    hyperparameters=static_hyperparameters
)

train_loc = sess.upload_data(path='./train.csv', bucket=bucket, key_prefix='churn/train')
val_loc = sess.upload_data(path='./validation.csv', bucket=bucket, key_prefix='churn/val')

hyperparameter_range = {
    'eta': ContinuousParameter(0.1, 0.8),
    'feature_xform': CategoricalParameter(['onehot', 'ordinal']) 
}

objective_metric_name = 'validation:error'
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_range,
    strategy='Bayesian',
    max_jobs=4,
    max_parallel_jobs=2,
    objective_type='Minimize'
)

tuner.fit(inputs={"train": train_loc, "validation": val_loc})

Expected behavior Typical HPO tunable parameters passed through argparse.

Screenshots or logs

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-16-fa2298a1a26b> in <module>
      1 # Train without buckets being parameters
      2 channels = {"train": train_loc, "validation": val_loc}
----> 3 tuner.fit(inputs=channels)

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in fit(self, inputs, job_name, include_cls_metadata, estimator_kwargs, wait, **kwargs)
    442         """
    443         if self.estimator is not None:
--> 444             self._fit_with_estimator(inputs, job_name, include_cls_metadata, **kwargs)
    445         else:
    446             self._fit_with_estimator_dict(inputs, job_name, include_cls_metadata, estimator_kwargs)

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in _fit_with_estimator(self, inputs, job_name, include_cls_metadata, **kwargs)
    453         self._prepare_estimator_for_tuning(self.estimator, inputs, job_name, **kwargs)
    454         self._prepare_for_tuning(job_name=job_name, include_cls_metadata=include_cls_metadata)
--> 455         self.latest_tuning_job = _TuningJob.start_new(self, inputs)
    456 
    457     def _fit_with_estimator_dict(self, inputs, job_name, include_cls_metadata, estimator_kwargs):

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in start_new(cls, tuner, inputs)
   1507             ]
   1508 
-> 1509         tuner.sagemaker_session.create_tuning_job(**tuner_args)
   1510         return cls(tuner.sagemaker_session, tuner._current_job_name)
   1511 

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in create_tuning_job(self, job_name, tuning_config, training_config, training_config_list, warm_start_config, tags)
   2027         LOGGER.info("Creating hyperparameter tuning job with name: %s", job_name)
   2028         LOGGER.debug("tune request: %s", json.dumps(tune_request, indent=4))
-> 2029         self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)
   2030 
   2031     def describe_tuning_job(self, job_name):

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateHyperParameterTuningJob operation: The hyperparameter tuning job that you requested has the following untunable hyperparameters: [feature_xform]. For the algorithm, 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1, you can tune only [colsample_bytree, lambda, eta, max_depth, alpha, colsample_bynode, num_round, colsample_bylevel, subsample, min_child_weight, max_delta_step, gamma]. Delete untunable hyperparameters.

System information A description of your system. Please provide:

Additional context Add any other context about the problem here.

farisfirenze commented 2 years ago

You are trying to tune feature_xform which is not supported by the version of xgboost you are using in your script.

Rizhiy commented 1 year ago

Any updates on this? The way it is currently implemented makes tuning script-mode XGBoost almost impossible, since we can't use the parameters we define.

It is possible to set custom parameters using estimator.set_hyperparameters(), so I don't understand why this restriction even exists?

marckarp commented 9 months ago

Have you tried extending the container and pushing it to your own ECR to bypass this issue?