aws-samples / easy-amazon-sagemaker-deployments

SageMaker custom deployments made easy
https://pypi.org/project/ezsmdeploy/
57 stars 16 forks source link

ClientError:An error occurred (ValidationException) when calling the CreateModel operation #4

Closed AtsunoriFujita closed 3 years ago

AtsunoriFujita commented 3 years ago

I ran this notebook and an error occurred. Is this missing something? The image doesn't seem to be pushed to the ECR.

https://github.com/aws-samples/easy-amazon-sagemaker-deployments/blob/master/notebooks/Using%20ezsmdeploy%20for%20sklearn%20deployments.ipynb

ClientError:An error occurred (ValidationException) when calling the CreateModel operation: Requested image********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-randomname not found.

w601sxs commented 3 years ago

IAM execution role is alright? Did the docker build operation work ? You can manually test building the docker file with assets that get pulled into the src folder.

On Mon, Mar 1, 2021, 7:01 AM atfujita notifications@github.com wrote:

I ran this notebook and an error occurred. Is this missing something? The image doesn't seem to be pushed to the ECR.

https://github.com/aws-samples/easy-amazon-sagemaker-deployments/blob/master/notebooks/Using%20ezsmdeploy%20for%20sklearn%20deployments.ipynb

ClientError:An error occurred (ValidationException) when calling the CreateModel operation: Requested image****. dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-randomname not found.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws-samples/easy-amazon-sagemaker-deployments/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFPFMZ5A4664DWPYXRJE3TBN63FANCNFSM4YMM44BA .

AtsunoriFujita commented 3 years ago

I'm creating an IAM Role in SageMaker and running code, do I need anything else?

I get the following error in test !./src/build-docker.sh test

Building container ezsmdeploy-image-test
Building **********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test
Creating repo for **********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test if it doesn't already exist
Getting login for **********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test
WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building locally
Sending build context to Docker daemon  14.34kB
Step 1/17 : FROM ubuntu:18.04
 ---> c090eaba6b94
Step 2/17 : LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
 ---> Using cache
 ---> 3be507c69122
Step 3/17 : LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
 ---> Using cache
 ---> d1c3b23e1b5a
Step 4/17 : RUN apt-get update &&     apt-get -y install --no-install-recommends     build-essential     ca-certificates     openjdk-8-jdk-headless     python3-dev     python3-pip     python3-setuptools     nginx     ca-certificates     curl     wget     vim     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 5509d40c643b
Step 5/17 : RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
 ---> Using cache
 ---> 59afd5d438f3
Step 6/17 : RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1
 ---> Running in 83d7553187e2
update-alternatives: error: alternative path /usr/local/bin/pip3 doesn't exist
The command '/bin/sh -c update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1' returned a non-zero code: 2
Error response from daemon: No such image: ezsmdeploy-image-test:latest
Pushing
The push refers to repository [**********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test]
An image does not exist locally with the tag: **********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test
**********.dkr.ecr.us-west-2.amazonaws.com/ezsmdeploy-image-test
SUCCESS

I get the following error when deploying locally

RuntimeError: Giving up, endpoint didn't launch correctly
w601sxs commented 3 years ago

The role should be alright then.

Can you try removing the two update alternatives lines in the Dockerfile that is in your local src folder?

It appears to be related to this Ubuntu 18.04 issue.

If you can test it out and let me know on this thread, I can fix it in the package

AtsunoriFujita commented 3 years ago

Yep, I will test and report.

AtsunoriFujita commented 3 years ago

I deleted two lines but the error didn't go away. RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

And I noticed that I got an error for Using ezsmdeploy for sklearn deployments.ipynb and no error for Using ezsmdeploy for sklearn ensemble deployments.ipynb.

What is the difference between these? As you say, whether or not the two lines are included seems to be one of the differences.

w601sxs commented 3 years ago

That should be the only difference .. and they should use the same docker container now

w601sxs commented 3 years ago

Can you check if you are using ezsmdeploy v 1.0.8 ? That includes the changes that help fix the ensemble example

Godseye14 commented 2 years ago

Hi, I'm getting the same error. Previously I had deployed sklearn model which worked perfectly. But when I try to deploy tensorflow(h5) model, I'm getting the error.

I tried removing the two lines RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1 . But as soon as I try to deploy, those lines automatically appear in the Dockerfile. Please let me know what needs to be done.

Error - Sending build context to Docker daemon 13.82kB Step 1/19 : FROM ubuntu:18.04 ---> 71cb16d32be4 Step 2/19 : LABEL com.amazonaws.sagemaker.capabilities.multi-models=true ---> Using cache ---> 6570ac7fbd1f Step 3/19 : LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true ---> Using cache ---> 58bb2e953b1f Step 4/19 : RUN apt -y update && apt -y upgrade && apt-get -y install curl && curl -sL https://deb.nodesource.com/setup_12.x | bash - && apt install nodejs -y && npm install -g @bazel/bazelisk ---> Using cache ---> e4885bd53f42 Step 5/19 : RUN apt-get update && apt-get -y install --no-install-recommends build-essential ca-certificates openjdk-8-jdk-headless python3-dev python3-pip python3-setuptools nginx ca-certificates curl wget vim && rm -rf /var/lib/apt/lists/* ---> Using cache ---> 6994691f8bd0 Step 6/19 : RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 ---> Using cache ---> 754a1e4ed117 Step 7/19 : RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1 ---> Running in dcea7a96fe17 update-alternatives: error: alternative path /usr/local/bin/pip3 doesn't exist The command '/bin/sh -c update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1' returned a non-zero code: 2 Error response from daemon: No such image: ezsmdeploy-image-drdr6pp9cqxxc5mtsdd73d:latest An image does not exist locally with the tag: ************.dkr.ecr.eu-central-1.amazonaws.com/ezsmdeploy-image-drdr6pp9cqxxc5mtsdd73d

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Requested image ************.dkr.ecr.eu-central-1.amazonaws.com/ezsmdeploy-image-drdr6pp9cqxxc5mtsdd73d not found.