Open mmamaev opened 5 years ago
Hi @mmamaev ,
Although the list is encoded to string, you can decode them back to list when using it. For example, if you are using SageMaker's framework container, you can add the decode logic in the user script before using the list.
So my question is, could you provide more details on this issue:
More details will help us understand the issue and may impact our decision whether to provide instruction for this case or make new feature as what you suggested (make the list automatically decoded in the hyperparameter json file).
Thanks!
Good morning, @yangaws,
Yes, I am using my own container. No, I do not find it difficult to decode a string back to list... if I know that I have to do it.
The problem is that sagemaker changes the type of the passed value when it is not expected to do so. The sagemaker doc reads:
hyperparameters()
Return the hyperparameters as a dictionary to use for training.
The fit() method, which trains the model, calls this method to find the hyperparameters.
Returns: The hyperparameters.
Return type: dict[str, str]
Indeed, it mentions that the return type is dict[str, str]. However, this notice of the return type is present only for EstimatorBase.hyperparameters()
and is absent for Estimator.hyperparameters()
.
Moreover, the Estimator.hyperparameters()
method returns the dict with value types as they've been passed to Estimator
such as lists, numbers, etc.
>>> hp = {'a': ['bb', 'cc'], 'x': 1.5, 125: False}
>>> estimator = sm.estimator.Estimator(image,
... role, 1, 'ml.c4.2xlarge',
... output_path="s3://{}/output".format(session.default_bucket()),
... sagemaker_session=session,
... hyperparameters=hp)
>>>
>>> estimator.hyperparameters()
{'a': ['bb', 'cc'], 'x': 1.5, 125: False}
So one is ok to assume that this is a "dictionary to use for training".
Errors that can be introduced could be quite nasty. For example, the list ['bb', 'cc']
and the string "['bb', 'cc']"
respond to the same calls, such as len()
or __getitem__()
, but with quite different results.
My understanding is that you're being compliant to the following specification:
"HyperParameters": {
"string" : "string"
}
Forcing hyperparameter values to be strings looks rather unwarranted. I am interested what is the reason behind it. To work around, I have to implement additional checks to differentiate between reading "conventional" json and "sagemaker-style" json, doing additional decoding for the latter. I see this as an unnecessary complication.
If this restriction cannot be lifted, I suggest that
Estimator.hyperparameters()
returns the dictionary with keys and values converted to strings (as it's probably supposed to do).Hi @mmamaev ,
Got what you mean. We need to talk whether this can be a feature change in our backlog (we make the hyperparameter as list in the json). But anyway our doc for now needs to be updated, let me add a doc label for now.
Thanks for the explanation.
Hey!
This is something that has being a pain for us too.
Do you find decoding the hyperparameter from string to list hard? If so, what's the problem?
Yes. The right solution to adapt ourselves to sagemaker's contract with hyperparameters is: keeping track of all possible key/values and their types in a custom image, just to parse it back to their expected format. And undoing the str()
applied by sagemaker, casting every hyperparameter that is not a string is quite time consuming.
This is not easy to do when we have to send dozens of hyperparameters. Also, it is quite strange to send something like
{
"objective": "binary",
"boosting_type": "gbdt",
"num_leaves": 31,
"metric": ["binary_logloss", "auc"],
"seed": None
}
and getting inside of the container something like:
{
"objective": "binary",
"boosting_type": "gbdt",
"num_leaves": "31",
"metric": "[\"binary_logloss\", \"auc\"]",
"seed": "None"
}
Is there any chance of changing how sagemaker deals with hyperparameters?
I imagine that the decision of forcing an str()
conversion was taken to prevent having to deal with json encoders or it's more related to some concern with safety. Therefore, letting us control the json encoding and decoding and expecting just a json string as an argument of Estimator
would help both sides?
we are facing this issue as well. And in our case it is nested dict (we use our customer docker images). its very hard to reparse this to json because we have different types of values.
you guys also convert double quotes to single quotes, so to effectively get this to work, i have to do this
param_grid = {
"svd__n_components": [ json.loads(training_params["svd"].replace("\'", "\""))["n_components"]],
"xgb__eval_set": [(X_test, y_test)],
}
Hi team, for those still having trouble here, I've been using these two functions to parse out the hyperparameters in my own estimator containers (it also strips leading 0's, just in case you work with coordinates in your day to day, which can cause trouble when trying to cast to an int or float):
def destringify_dict_values(d):
return {k: destringify(v) for k, v in d.items()}
def destringify(s):
if s == len(s) * "0":
return 0
else:
s = s.lstrip("0")
if isinstance(s, str):
try:
val = literal_eval(s)
except ValueError:
val = s
else:
val = s
if isinstance(val, float):
if val.is_integer():
return int(val)
return val
return val
It uses literal_eval
from ast
, which is not ideal, but still much safer than eval
, since it raises a ValueError
if it can't cast to a basic Python type.
+1 I wasted a whole day of my life trying to debug this...training the docker container worked locally, but on sagemaker it did not! Truly an insidious issue.
This issue is also being a pain for us too. Hopefully you can provide soon a way to handle lists as hyperparams
I'm having the issue too. From what I gathered, there has been no official response to this thread for more than a year now, quite baffling.
Furthermore, trying to find a workaround to @mmamaev 's situation, I found something maybe more insidious. This is the script, named sandbox.py
, that I try to run:
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-a', nargs='*')
args, _ = parser.parse_known_args()
print(args)
On my laptop, when I run python sandbox.py -a def xyz
I get the expected print: Namespace(a=['def', 'xyz'])
. So args.a
is the correct expected list: ['def', 'xyz']
.
When I run the following on sagemaker it fails:
role = get_execution_role()
image_name = ...
hyperparameters={'a': 'def xyz'
}
output_path = ...
estimator = Estimator(image_name=image_name,
role=role,
train_instance_count=1,
output_path=output_path,
train_instance_type='local',
hyperparameters=hyperparameters)
estimator.fit()
where the image image_name
has been built using this Dockerfile:
FROM tensorflow/tensorflow:2.0.0-py3
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir sagemaker-containers
COPY sandbox.py /opt/ml/code/
ENV SAGEMAKER_PROGRAM sandbox.py
Here is the end of the logs of the execution:
algo-1-7blkd_1 | SM_USER_ARGS=["-a","def xyz"]
algo-1-7blkd_1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
algo-1-7blkd_1 | SM_HP_A=def xyz
algo-1-7blkd_1 | PYTHONPATH=/opt/ml/code:/usr/local/bin:/usr/lib/python36.zip:/usr/lib/python3.6:/usr/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/dist-packages:/usr/lib/python3/dist-packages
algo-1-7blkd_1 |
algo-1-7blkd_1 | Invoking script with the following command:
algo-1-7blkd_1 |
algo-1-7blkd_1 | /usr/bin/python3 sandbox.py -a def xyz
algo-1-7blkd_1 |
algo-1-7blkd_1 |
algo-1-7blkd_1 | Namespace(a=['def xyz'])
algo-1-7blkd_1 | 2020-02-21 13:08:04,897 sagemaker-containers INFO Reporting training SUCCESS
tmpzd7en6gx_algo-1-7blkd_1 exited with code 0
Aborting on container exit...
===== Job Complete =====
As you see, here args.a
is a list of 1 element, that is args.a=['def xyz']
. Whereas sagemaker says that it called the exact same line that I did on my laptop, that is, /usr/bin/python3 sandbox.py -a def xyz
. Did sagemaker straight up lie to me and somehow escaped the space between def
and xyz
without saying ?
So my workaround does not work and I have to rely on a painful hyperparameter decoding. @ihoelscher did a great job explaining why this should not be.
Any news @yangaws ?
Edit: in OP's case the following parameter decoding makes the code works (with 'a': 'def xyz'
) both on my laptop and on sagemaker: just add these two lines after args, _ = parser.parse_known_args()
args.a = [s.split() for s in args.a]
args.a = list(itertools.chain.from_iterable(args.a))
itertools
needs to be imported.
What is the status on this issue? I believe the relevant changes were not included on v2.0.0. @laurenyu
@diegodebrito no, it wasn't included in v2.0.0 unfortunately
cc @ajaykarpur, who might know about the current status of this issue
Glad to create a PR for this immediately. I can't believe this has not been addressed in nearly a year.
Hi
Currently I'm doing
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html
I'm in: "Step 4: Train a Model"
And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."
I'm having a problem after executing:
xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True)
After going through some documentation: https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
https://xgboost.readthedocs.io/en/latest/parameter.html
I see that the list of available options in objective
This is how I have it right now 1:
xgb_model.set_hyperparameters(
max_depth = 5,
eta = 0.2,
gamma = 4,
min_child_weight = 6,
subsample = 0.7,
objective = "binary:logistic",
num_round = 1000
)
I've tried to use
2:
...
objective = {"binary": "logistic"},
...
3:
...
objective = ['binary: logistic'],
...
The error output varies but it's always like this:
hyperparameter_validation.py", line 47, in validate_range raise exc.UserError("Hyperparameter {}: {} is not in {}".format(self.name, value, self.range)) sagemaker_algorithm_toolkit.exceptions.UserError: Hyperparameter objective: {'binary:logistic'} is not in ['aft_loss_distribution', 'binary:logistic', 'binary:logitraw', 'binary:hinge', 'count:poisson', 'multi:softmax', 'multi:softprob', 'rank:pairwise', 'rank:ndcg', 'rank:map', 'reg:linear', 'reg:squarederror', 'reg:logistic', 'reg:gamma', 'reg:pseudohubererror', 'reg:squaredlogerror', 'reg:tweedie', 'survival:aft', 'survival:cox']
Do you guys now if I'm doing something wrong? I would think the tutorial is a no brainer but maybe I'm missing something
Hi
Currently I'm doing
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html
I'm in: "Step 4: Train a Model"
And in the "3. Set the hyperparameters for the XGBoost algorithm by calling the set_hyperparameters method of the estimator."
I'm having a problem after executing:
xgb_model.fit({"train": train_input, "validation": validation_input}, wait=True)
...
@xjwalker I had the same error message. Mine was caused by a wrong train-val data format. I forgot to concatenate the label column to the dataframe. IMDB Sentiment Analysis XGBoost (Batch Transform).ipynb
# TODO: Save the training and validation data to train.csv and validation.csv in the data_dir directory.
# Make sure that the files you create are in the correct format.
# Note: the first column is the label
pd.concat([train_y, train_X], axis=1).to_csv(os.path.join(data_dir, 'train.csv'), header=False, index=False)
pd.concat([val_y, val_X], axis=1).to_csv(os.path.join(data_dir, 'validation.csv'), header=False, index=False)```
Nothing? I am currently trying to pass a list of str (or int even) to an estimator and getting my list cut at the comma. Any developments on this issue?
@yangaws
As a workaround the environment variable SM_TRAINING_ENV
has the hyperparameters in a properly formatted JSON string. This includes the proper JavaScript style bools (false
/true
) and null
for None
.
This works much nicer than trying to parse the hyperparams from strings:
hyperparameters = json.loads(os.environ["SM_TRAINING_ENV"])["hyperparameters"]
Would be great if this could get sorted or if the docs were more expressive
bump
any updates? parsing values can be a pain if it's all nested and has different data types. This adds a lot of extra workload. If user simply passes in a python dict, they would expect to get the same thru a simple load using the json module.
I found a workaround when set_hyperparameters()
adds escaped quotes to string (seems like some people in this thread have this issue).
Instead of using estimator.set_hyperparameters(my_param="some_string"), you can directly modify the hyperparameter using:
estimator._hyperparameters.update(dict(my_param="some_string"))`
Hope that helps!
Parsing this was particularly annoying when passing a nested dict as a hyperparameter. One workaround I found that bypasses any manual parsing was to base64 encode the dict as a json string, and pass that to my Estimator and then decode in my training script:
def b64encoded_str(j):
return base64.b64encode(json.dumps(j).encode('utf-8')).decode('utf-8')
config = {...} #Nest dict of lists, float, and dict
hyperparameters = {'b64config': b64encoded_str(config)}
estimator = PyTorch(
image_uri=image_uri
...,
hyperparamters=hyperparameters
)
estimator.fit(inputs=inputs)
and then in my training script:
def train(config):
....
def b64decode_str(s):
return json.loads(base64.b64decode(s).decode('utf-8'))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--b64config', type=str, required=True)
args = parser.parse_args()
cfg = b64decode_str(args.b64config)
train(cfg)
Unideal that the hyperparameters are not legible on the Sagemaker console, but makes up for it in that it doesn't require any parsing!
Have been 3 years. Since this came up. Nothing?
I want ro pass a list as a hyperparameter to an estimator. For example, hyperparameter key = 'a' hyperparameter value = ['def', 'xyz']
This type of value is accepted and get verified:
I expect that sagemaker renders hyperparameters to the following json to be found inside /opt/ml/input/config/hyperparameters.json:
Instead, the contents of /opt/ml/input/config/hyperparameters.json is:
The
list
type of the value is lost, being forcefully converted to a string by this code:https://github.com/aws/sagemaker-python-sdk/blob/1c94079ddf1395d2edd82b5be38e989aabf378d8/src/sagemaker/estimator.py#L553-L554