Closed juliensimon closed 6 years ago
had the same problem, the code will run json.loads on each hyperparameter, so you have to set you region parameters to '"us-east-1"' or something like that (notice the double quotes)
Thanks. I ended up reading that code too :)
This should definitely be fixed:
Hi, thanks for the feedback!
There's a problem we've known about for a while with the Python SDK - the "official" MXNet/TensorFlow images were meant to be used with the MXNet/TensorFlow estimator classes, which automatically provide the region and hyperparameter serialization.
We need to add a constructor arg to those classes to allow the default images to be overridden - this would be the best way to have a clean experience. (there's some trickiness with attach we have to work out in order to implement this.) Then, you would you the MXNet class instead of the generic Estimator class with your image.
I'll +1 the priority of this issue in our backlog.
In the meantime, you can also do something like: https://github.com/aws/sagemaker-mxnet-containers/blob/master/test/functional/test_mnist_distributed.py#L22
Thanks Winston.
Closing, please reopen if you encounter more blockers or have more feedback.
Hi,
I built a custom MXNet container using https://github.com/aws/sagemaker-mxnet-containers, and pushed it to ECR. The container is fine as far as I can tell (inspecting with 'docker run', etc).
When I run this:
The training job fails with:
It looks like I'd need to set a 'sagemaker_region' parameter, which is weird because SageMaker should know what the region is.
Anyway, if I try to set it (in the Estimator or with set_hyperparameters):
Then the job fails because it can't deserialize hyperparameters:
Have I missed anything? Thanks for your help.