Closed oelesinsc24 closed 5 years ago
@JohnCalhoun, we actually resolved this. It turns out that this image 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.0.0-cpu-py2
is no longer available as the framework version, 1.0.0, is not supported anymore.
With 1.8.0, and 1.12.0, model training was successful. Image Arn:
520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.8.0-cpu-py2
520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.12.0-cpu-py2
When deploying a TensorFlow training job with SageMaker Build, we get the error:
The container ECR ARN is:
520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.0.0-cpu-py2
.We tried pulling the container via docker CLI after successful ECR login with AWS CLI from shell, we get the error:
Error response from daemon: manifest for 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.0.0-cpu-py2 not found
.Going through the documentation on pre-built containers, https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html, it turns out that the account
520713654638
does not have container images in the regioneu-west-1
. However SageMaker TensorFlow container images are available in the account763104351884
and we were able to pull the containers successfully viadocker pull
:The would probably be here: https://github.com/aws-samples/aws-sagemaker-build/blob/5a16995af6fcf8ac12caa56f55f287ba0b288754/lambda/util/nodejs/lib/CreateImageURI.js#L37
Thanks a lot for your help