aws-samples / sagemaker-run-notebook

Tools to run Jupyter notebooks as jobs in Amazon SageMaker - ad hoc, on a schedule, or in response to events
Apache License 2.0
142 stars 70 forks source link

Docker fails #33

Closed WZimmerman83 closed 3 years ago

WZimmerman83 commented 3 years ago

Hi,

After wading into the "customizing AWS SageMaker Studio" pond I've ran into docker related problems when running

run-notebook create-container

The process (which I have truncated some of it) logins, and runs the PRE_BUILD successfully but then fails on

docker pull -q ${BASE_IMAGE}

and yes the build is not successful as I tested the schedule and it fails with:

Failed (ClientError: API error (404): manifest for XXXXXXX.dkr.ecr.us-west-2.amazonaws.com/notebook-runner:latest not found: manifest unknown: Requested image not found)

Do we need to pay for docker licenses to schedule a process to run in Studio?

Login Succeeded

[Container] 2021/10/26 07:22:31 Running command base_region=$(echo ${BASE_IMAGE} | sed -n 's%\([0-9]*\)\.dkr\.ecr\.\([^.]*\)\.amazonaws.com/.*%\2%p')

[Container] 2021/10/26 07:22:31 Running command base_account=$(echo ${BASE_IMAGE} | sed -n 's%\([0-9]*\)\.dkr\.ecr\.\([^.]*\)\.amazonaws.com/.*%\1%p')

[Container] 2021/10/26 07:22:31 Running command if [ "${base_account}" != "" ]; then aws ecr get-login-password --region ${base_region} | docker login --username AWS --password-stdin ${base_account}.dkr.ecr.${base_region}.amazonaws.com; fi

[Container] 2021/10/26 07:22:31 Phase complete: PRE_BUILD State: SUCCEEDED
[Container] 2021/10/26 07:22:31 Phase context status code:  Message:
[Container] 2021/10/26 07:22:32 Entering phase BUILD
[Container] 2021/10/26 07:22:32 Running command echo Build started on `date`
Build started on Tue Oct 26 07:22:32 UTC 2021

[Container] 2021/10/26 07:22:32 Running command echo Ensure docker repo exists and create it if necessary...
Ensure docker repo exists and create it if necessary...

[Container] 2021/10/26 07:22:32 Running command if ! aws ecr describe-repositories --repository-names "$IMAGE_REPO_NAME" >/dev/null 2>&1; then aws ecr create-repository --repository-name "$IMAGE_REPO_NAME"; echo Repository created; else echo Repository already exists, proceeding.; fi
Repository already exists, proceeding.

[Container] 2021/10/26 07:22:32 Running command echo Pulling the base image...
Pulling the base image...

[Container] 2021/10/26 07:22:32 Running command docker pull -q ${BASE_IMAGE}
toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

[Container] 2021/10/26 07:22:34 Command did not exit successfully docker pull -q ${BASE_IMAGE} exit status 1
[Container] 2021/10/26 07:22:34 Phase complete: BUILD State: FAILED
[Container] 2021/10/26 07:22:34 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker pull -q ${BASE_IMAGE}. Reason: exit status 1
[Container] 2021/10/26 07:22:34 Entering phase POST_BUILD
[Container] 2021/10/26 07:22:34 Running command echo Build completed on `date`
Build completed on Tue Oct 26 07:22:34 UTC 2021

[Container] 2021/10/26 07:22:34 Running command echo Pushing the Docker image...
Pushing the Docker image...

[Container] 2021/10/26 07:22:34 Running command docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
The push refers to repository [XXXXXXX.dkr.ecr.us-west-2.amazonaws.com/notebook-runner]
An image does not exist locally with the tag: XXXXXXX.dkr.ecr.us-west-2.amazonaws.com/notebook-runner

[Container] 2021/10/26 07:22:34 Command did not exit successfully docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG exit status 1
[Container] 2021/10/26 07:22:34 Phase complete: POST_BUILD State: FAILED
[Container] 2021/10/26 07:22:34 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG. Reason: exit status 1
fjpa121197 commented 2 years ago

Hi @Billpete002, how were you able to solve this issue?