aws / sagemaker-training-toolkit

Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
476 stars 112 forks source link

Failed to parse string hyperparameter #66

Open uwaisiqbal opened 4 years ago

uwaisiqbal commented 4 years ago

Describe the bug I would like to pass hyperparameters to my sagemaker job that are of type string. However, when I do this I get an error saying that they failed to parse

To reproduce

hyperparams = {'test': 10,
                    'a': 50,
                    'b': 'some text'}

estimator = Estimator(
            image_name=image_uri,
            role=iam_role,
            output_path=f"s3://{aws_params['SCW_S3_BUCKET']}/sagemaker/output/",
            train_instance_count=instance_count,
            input_mode='File',
            train_instance_type='local',
            tags=TR_TAGS,
            subnets=aws_params['VPC_SUBNETS'],
            security_group_ids=aws_params['VPC_SGS'],
            output_kms_key=aws_params['SCW_KMS_KEY'],
            hyperparameters=hyperparams,
        )

I get the following error message:

algo-1-tfbx9_1  | 2020-06-26 11:46:29,060 - sagemaker-training-toolkit - INFO - Failed to parse hyperparameter b value some text to Json.

I suspect this has something to do with the code in https://github.com/aws/sagemaker-containers/blob/8ba4085548d28c8651cbd28b3049af58b3057fbc/src/sagemaker_containers/_env.py#L214

It seems as those these string valued hyperparameters aren't being passed to the command in the sagemaker job

ajaykarpur commented 4 years ago

Hi @uwaisiqbal, does the training job fail as a result of this error? Or are you just asking why the error message appears?

It looks like this message is logged when the value is not itself a JSON string, but it still parses the value and stores it in deserialized_hps: https://github.com/aws/sagemaker-training-toolkit/blob/9d901371e868dec17bcb1d5adc0fd46b6da18e42/src/sagemaker_training/environment.py#L208-L221

That said, this message can definitely be removed or rewritten for clarity.

uwaisiqbal commented 4 years ago

The training job still proceeds but the string valued hyper parameter isn't passed to the entry point I've defined in my dockerfile. Is there a difference in how hyper parameters are parsed and propagated for python scripts vs entry points?

metrizable commented 4 years ago

@uwaisiqbal The hyperparameters should be available. The SageMaker service makes these available in a hyperparameters.json file, and if you've utilized the sagemaker-training-toolkit, read in and made available as environment variables to your script/entry point. I've made comments on a related issue #65 that offers some additional details.

wangrui6 commented 3 years ago

The training job still proceeds but the string valued hyper parameter isn't passed to the entry point I've defined in my dockerfile. Is there a difference in how hyper parameters are parsed and propagated for python scripts vs entry points?

have you figured out how to parse in the hyperparameters? Basically I ran into the same problem and found no straightforward answers.

steelersd commented 3 years ago

@uwaisiqbal @wangrui6

See if this helps json_encode_hyperparameters. I've used it successfully with my work.

def json_encode_hyperparameters(hyperparameters):
    return {str(k): json.dumps(v) for (k, v) in hyperparameters.items()}

hyperparameters = json_encode_hyperparameters({
    "hp1": "value1",
    "hp2": 300,
    "hp3": 0.001})

You can see an example at Script-mode Custom Training Container

cmohamma commented 1 month ago

Has there been a solution to this? The suggested update no longer seems to work after the update to Estimator: TypeError: Estimator.set_hyperparameters() takes 1 positional argument but 2 were given