Open athewsey opened 3 years ago
Thanks for the detailed write-up.
Agreed that the preferable approach would be to not require logging into each and every ECR registry. Your suggestion of adding a registry-id
parameter works, however we could make this seamless by auto-detecting the base ECR registry and region and logging into it. Here's how that could work
Yeah auto-detection was my first thought & preference too - but then wondered if there might still be some edge cases a naïve implementation could miss... E.g. there could be multi-stage builds with multiple FROM
statements (which should be easy enough to handle), or maybe there are use cases where it's not obvious from the Dockerfile at all which registri(es) are needed? I don't know enough to rule it out.
The underlying tenet of this library is that is works out of the box without requiring any additional inputs over a regular docker build
. This works by setting sensible defaults for underlying AWS resources like S3 and CodeBuild.
There may well be edge-cases, but if we can handle 80% of the use-cases with auto-detection then that is default behavior to go with, while allowing power users to specify additional, optional fields to override the defaults.
Is there any work around for this problem?
I'm trying to run the following dockerfile https://github.com/aws/amazon-sagemaker-examples/blob/master/training/distributed_training/tensorflow/data_parallel/maskrcnn/Dockerfile
EDIT: I had actually an issue that ECR authentication was being done for us-east-1
when Dockerfile
contained image from us-west-2
. Changing region in Dockerfile
to us-east-1
fixed the issue.
I get the same error when I try to build an image based on this FROM statement. FROM 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.2-1
I thought the account 683313688378 was logged in with the following command, but what could be the cause or workaround? Running command $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION --registry-ids 683313688378)
I share the same concern as the author of this Issue. It is not possible to share AUTH
credentials with the library. As a result, it is impossible to build upon ECR registry containers.
Similar example:
"https://763104351884.dkr.ecr.us-east-1.amazonaws.com/v2/pytorch-training/manifests/1.11.0-gpu-py38-cu115-ubuntu20.04-e3": no basic auth credentials
Even if you run immediately before:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
I'm trying to
sm-docker build
a container derived from SageMaker Scikit-Learn framework container inap-southeast-1
, something like the following:...so Dockerfile is
FROM 121021644041.dkr....etc
Seems like the CLI tool spins up successfully and logs in to a load of other ECR registries, but not
121021644041
: Then fails on step 1 with:I've since tested and on a SageMaker Notebook Instance I can build the same Dockerfile fine, so long as I log in to the
121021644041
ECR first.From a cursory look at the job logs and #12, it looks like the current strategy is to have the tool
ecr login
to every AWS account on which AWS DLCs are provided?...So would the correct fix be to add every account Id listed here to support SKLearn?
I was thinking it might be preferable to also add a way for users to indicate extra required account IDs through the CLI, since: