aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

Hyperparameter values forcefully converted to strings, thus unable to pass a list #613

Open mmamaev opened 5 years ago

mmamaev commented 5 years ago

I want ro pass a list as a hyperparameter to an estimator. For example, hyperparameter key = 'a' hyperparameter value = ['def', 'xyz']

This type of value is accepted and get verified:

>>> estimator.set_hyperparameters(a=['def', 'xyz'])
>>> estimator.hyperparameters()['a']
['def', 'xyz']
>>> type(estimator.hyperparameters()['a'])
<class 'list'>

I expect that sagemaker renders hyperparameters to the following json to be found inside /opt/ml/input/config/hyperparameters.json:

>>> print(json.dumps(estimator.hyperparameters()))
{"a": ["def", "xyz"]}

Instead, the contents of /opt/ml/input/config/hyperparameters.json is:

{"a": "['def', 'xyz']"}

The list type of the value is lost, being forcefully converted to a string by this code:

https://github.com/aws/sagemaker-python-sdk/blob/1c94079ddf1395d2edd82b5be38e989aabf378d8/src/sagemaker/estimator.py#L553-L554

yangaws commented 5 years ago

Hi @mmamaev ,

Although the list is encoded to string, you can decode them back to list when using it. For example, if you are using SageMaker's framework container, you can add the decode logic in the user script before using the list.

So my question is, could you provide more details on this issue:

  1. What are you trying to do? Are you using your own container, SageMaker's framework container or SageMaker's first party algorithm container?
  2. Do you find decoding the hyperparameter from string to list hard? If so, what's the problem?

More details will help us understand the issue and may impact our decision whether to provide instruction for this case or make new feature as what you suggested (make the list automatically decoded in the hyperparameter json file).

Thanks!

mmamaev commented 5 years ago

Good morning, @yangaws,

Yes, I am using my own container. No, I do not find it difficult to decode a string back to list... if I know that I have to do it.

The problem is that sagemaker changes the type of the passed value when it is not expected to do so. The sagemaker doc reads:

hyperparameters()
    Return the hyperparameters as a dictionary to use for training.

    The fit() method, which trains the model, calls this method to find the hyperparameters.

        Returns:        The hyperparameters.
        Return type:  dict[str, str]

Indeed, it mentions that the return type is dict[str, str]. However, this notice of the return type is present only for EstimatorBase.hyperparameters() and is absent for Estimator.hyperparameters().

Moreover, the Estimator.hyperparameters() method returns the dict with value types as they've been passed to Estimator such as lists, numbers, etc.

>>> hp = {'a': ['bb', 'cc'], 'x': 1.5, 125: False}
>>> estimator = sm.estimator.Estimator(image,
...                        role, 1, 'ml.c4.2xlarge',
...                        output_path="s3://{}/output".format(session.default_bucket()),
...                        sagemaker_session=session,
...                        hyperparameters=hp)
>>>
>>> estimator.hyperparameters()
{'a': ['bb', 'cc'], 'x': 1.5, 125: False}

So one is ok to assume that this is a "dictionary to use for training".

Errors that can be introduced could be quite nasty. For example, the list ['bb', 'cc'] and the string "['bb', 'cc']" respond to the same calls, such as len() or __getitem__(), but with quite different results.

My understanding is that you're being compliant to the following specification:

   "HyperParameters": { 
      "string" : "string" 
   }

(AWS Documentation - Amazon SageMaker - Developer Guide - API Reference - Actions - Amazon SageMaker Service - CreateTrainingJob)

Forcing hyperparameter values to be strings looks rather unwarranted. I am interested what is the reason behind it. To work around, I have to implement additional checks to differentiate between reading "conventional" json and "sagemaker-style" json, doing additional decoding for the latter. I see this as an unnecessary complication.

If this restriction cannot be lifted, I suggest that

yangaws commented 5 years ago

Hi @mmamaev ,

Got what you mean. We need to talk whether this can be a feature change in our backlog (we make the hyperparameter as list in the json). But anyway our doc for now needs to be updated, let me add a doc label for now.

Thanks for the explanation.

roelschr commented 5 years ago

Hey!

This is something that has being a pain for us too.

Do you find decoding the hyperparameter from string to list hard? If so, what's the problem?

Yes. The right solution to adapt ourselves to sagemaker's contract with hyperparameters is: keeping track of all possible key/values and their types in a custom image, just to parse it back to their expected format. And undoing the str() applied by sagemaker, casting every hyperparameter that is not a string is quite time consuming.

This is not easy to do when we have to send dozens of hyperparameters. Also, it is quite strange to send something like

{
    "objective": "binary",
    "boosting_type": "gbdt",
    "num_leaves": 31,
    "metric": ["binary_logloss", "auc"],
    "seed": None
} 

and getting inside of the container something like:

{
    "objective": "binary",
    "boosting_type": "gbdt",
    "num_leaves": "31",
    "metric": "[\"binary_logloss\", \"auc\"]",
    "seed": "None"
} 

Is there any chance of changing how sagemaker deals with hyperparameters?

I imagine that the decision of forcing an str() conversion was taken to prevent having to deal with json encoders or it's more related to some concern with safety. Therefore, letting us control the json encoding and decoding and expecting just a json string as an argument of Estimator would help both sides?

sandys commented 5 years ago

we are facing this issue as well. And in our case it is nested dict (we use our customer docker images). its very hard to reparse this to json because we have different types of values.

you guys also convert double quotes to single quotes, so to effectively get this to work, i have to do this

param_grid = {
            "svd__n_components": [ json.loads(training_params["svd"].replace("\'", "\""))["n_components"]],
            "xgb__eval_set": [(X_test, y_test)],
        }
nhirons commented 5 years ago

Hi team, for those still having trouble here, I've been using these two functions to parse out the hyperparameters in my own estimator containers (it also strips leading 0's, just in case you work with coordinates in your day to day, which can cause trouble when trying to cast to an int or float):

def destringify_dict_values(d):
    return {k: destringify(v) for k, v in d.items()}

def destringify(s):

    if s == len(s) * "0":
        return 0
    else:
        s = s.lstrip("0")

    if isinstance(s, str):
        try:
            val = literal_eval(s)
        except ValueError:
            val = s
    else:
        val = s

    if isinstance(val, float):
        if val.is_integer():
            return int(val)
        return val

    return val

It uses literal_eval from ast, which is not ideal, but still much safer than eval , since it raises a ValueError if it can't cast to a basic Python type.

ARDivekar commented 4 years ago

+1 I wasted a whole day of my life trying to debug this...training the docker container worked locally, but on sagemaker it did not! Truly an insidious issue.

vcodina commented 4 years ago

This issue is also being a pain for us too. Hopefully you can provide soon a way to handle lists as hyperparams

durandg12 commented 4 years ago

I'm having the issue too. From what I gathered, there has been no official response to this thread for more than a year now, quite baffling.

Furthermore, trying to find a workaround to @mmamaev 's situation, I found something maybe more insidious. This is the script, named sandbox.py, that I try to run:

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-a', nargs='*')
    args, _ = parser.parse_known_args()
    print(args)

On my laptop, when I run python sandbox.py -a def xyz I get the expected print: Namespace(a=['def', 'xyz']). So args.a is the correct expected list: ['def', 'xyz'].

When I run the following on sagemaker it fails:

role = get_execution_role()
image_name = ...
hyperparameters={'a': 'def xyz'
                }
output_path = ...
estimator = Estimator(image_name=image_name,
                    role=role,
                    train_instance_count=1,
                    output_path=output_path,
                    train_instance_type='local',
                    hyperparameters=hyperparameters)

estimator.fit()

where the image image_name has been built using this Dockerfile:

FROM tensorflow/tensorflow:2.0.0-py3

RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir sagemaker-containers

COPY sandbox.py /opt/ml/code/
ENV SAGEMAKER_PROGRAM sandbox.py

Here is the end of the logs of the execution:

algo-1-7blkd_1  | SM_USER_ARGS=["-a","def xyz"]
algo-1-7blkd_1  | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
algo-1-7blkd_1  | SM_HP_A=def xyz
algo-1-7blkd_1  | PYTHONPATH=/opt/ml/code:/usr/local/bin:/usr/lib/python36.zip:/usr/lib/python3.6:/usr/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/dist-packages:/usr/lib/python3/dist-packages
algo-1-7blkd_1  | 
algo-1-7blkd_1  | Invoking script with the following command:
algo-1-7blkd_1  | 
algo-1-7blkd_1  | /usr/bin/python3 sandbox.py -a def xyz
algo-1-7blkd_1  | 
algo-1-7blkd_1  | 
algo-1-7blkd_1  | Namespace(a=['def xyz'])
algo-1-7blkd_1  | 2020-02-21 13:08:04,897 sagemaker-containers INFO     Reporting training SUCCESS
tmpzd7en6gx_algo-1-7blkd_1 exited with code 0
Aborting on container exit...
===== Job Complete =====

As you see, here args.a is a list of 1 element, that is args.a=['def xyz']. Whereas sagemaker says that it called the exact same line that I did on my laptop, that is, /usr/bin/python3 sandbox.py -a def xyz. Did sagemaker straight up lie to me and somehow escaped the space between def and xyz without saying ?

So my workaround does not work and I have to rely on a painful hyperparameter decoding. @ihoelscher did a great job explaining why this should not be.

Any news @yangaws ?

Edit: in OP's case the following parameter decoding makes the code works (with 'a': 'def xyz') both on my laptop and on sagemaker: just add these two lines after args, _ = parser.parse_known_args()

    args.a = [s.split() for s in args.a]
    args.a = list(itertools.chain.from_iterable(args.a))

itertools needs to be imported.

diegodebrito commented 3 years ago

What is the status on this issue? I believe the relevant changes were not included on v2.0.0. @laurenyu

laurenyu commented 3 years ago

@diegodebrito no, it wasn't included in v2.0.0 unfortunately

cc @ajaykarpur, who might know about the current status of this issue

theoldfather commented 3 years ago

Glad to create a PR for this immediately. I can't believe this has not been addressed in nearly a year.

xjwalker commented 3 years ago

Hi

Currently I'm doing

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html

I'm in: "Step 4: Train a Model"

And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."

I'm having a problem after executing: xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True)

After going through some documentation: https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

https://xgboost.readthedocs.io/en/latest/parameter.html

I see that the list of available options in objective

This is how I have it right now 1:

xgb_model.set_hyperparameters(
    max_depth = 5,
    eta = 0.2,
    gamma = 4,
    min_child_weight = 6,
    subsample = 0.7,
    objective = "binary:logistic",
    num_round = 1000
)

I've tried to use

2:

...
 objective = {"binary": "logistic"},
 ...

3:

...
 objective = ['binary: logistic'],
 ...

The error output varies but it's always like this:

  1. Failed to parse hyperparameter objective value binary:logistic to Json.

hyperparameter_validation.py", line 47, in validate_range raise exc.UserError("Hyperparameter {}: {} is not in {}".format(self.name, value, self.range)) sagemaker_algorithm_toolkit.exceptions.UserError: Hyperparameter objective: {'binary:logistic'} is not in ['aft_loss_distribution', 'binary:logistic', 'binary:logitraw', 'binary:hinge', 'count:poisson', 'multi:softmax', 'multi:softprob', 'rank:pairwise', 'rank:ndcg', 'rank:map', 'reg:linear', 'reg:squarederror', 'reg:logistic', 'reg:gamma', 'reg:pseudohubererror', 'reg:squaredlogerror', 'reg:tweedie', 'survival:aft', 'survival:cox']

Do you guys now if I'm doing something wrong? I would think the tutorial is a no brainer but maybe I'm missing something

Nov05 commented 3 years ago

Hi

Currently I'm doing

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html

I'm in: "Step 4: Train a Model"

And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."

I'm having a problem after executing: xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True) ...

@xjwalker I had the same error message. Mine was caused by a wrong train-val data format. I forgot to concatenate the label column to the dataframe. IMDB Sentiment Analysis XGBoost (Batch Transform).ipynb


# TODO: Save the training and validation data to train.csv and validation.csv in the data_dir directory.
#       Make sure that the files you create are in the correct format.
# Note: the first column is the label
pd.concat([train_y, train_X], axis=1).to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)
pd.concat([val_y, val_X], axis=1).to_csv(os.path.join(data_dir, 'validation.csv'), header=False, index=False)```
clausagerskov commented 2 years ago

Nothing? I am currently trying to pass a list of str (or int even) to an estimator and getting my list cut at the comma. Any developments on this issue?

clausagerskov commented 1 year ago

@yangaws

tarik-affinda commented 1 year ago

As a workaround the environment variable SM_TRAINING_ENV has the hyperparameters in a properly formatted JSON string. This includes the proper JavaScript style bools (false/true) and null for None.

This works much nicer than trying to parse the hyperparams from strings:

hyperparameters = json.loads(os.environ["SM_TRAINING_ENV"])["hyperparameters"]
alessandro-mrd commented 1 year ago

Would be great if this could get sorted or if the docs were more expressive

clausagerskov commented 1 year ago

bump

zhenliu2012 commented 1 year ago

any updates? parsing values can be a pain if it's all nested and has different data types. This adds a lot of extra workload. If user simply passes in a python dict, they would expect to get the same thru a simple load using the json module.

nbeuchat commented 1 year ago

I found a workaround when set_hyperparameters() adds escaped quotes to string (seems like some people in this thread have this issue).

Instead of using estimator.set_hyperparameters(my_param="some_string"), you can directly modify the hyperparameter using:estimator._hyperparameters.update(dict(my_param="some_string"))`

Hope that helps!

kevinpmcgee14 commented 1 year ago

Parsing this was particularly annoying when passing a nested dict as a hyperparameter. One workaround I found that bypasses any manual parsing was to base64 encode the dict as a json string, and pass that to my Estimator and then decode in my training script:

def b64encoded_str(j):
    return base64.b64encode(json.dumps(j).encode('utf-8')).decode('utf-8')

config = {...}  #Nest dict of lists, float, and dict
hyperparameters = {'b64config': b64encoded_str(config)}
estimator = PyTorch(
    image_uri=image_uri
    ...,
    hyperparamters=hyperparameters
)
estimator.fit(inputs=inputs)

and then in my training script:

def train(config):
     ....

def b64decode_str(s):
    return json.loads(base64.b64decode(s).decode('utf-8'))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--b64config', type=str, required=True)
    args = parser.parse_args()

    cfg = b64decode_str(args.b64config)
    train(cfg)

Unideal that the hyperparameters are not legible on the Sagemaker console, but makes up for it in that it doesn't require any parsing!

RaydelMiranda commented 2 months ago

Have been 3 years. Since this came up. Nothing?