GoogleCloudPlatform / mlops-on-gcp

Apache License 2.0
770 stars 2.71k forks source link

LAB-01 Unable to create model resource & model version #39

Open pbavinck opened 4 years ago

pbavinck commented 4 years ago

Create model resource

Section "Deploy the model to AI Platform Prediction", create model resource.

The following code always executes the ELSE part, instead of the IF, which means the resource does NOT get created.

model_name = 'forest_cover_classifier'
labels = "task=classifier,domain=forestry"
filter = 'name:{}'.format(model_name)
models = !(gcloud ai-platform models list --filter={filter} --format='value(name)')

if not models:
    !gcloud ai-platform models create  $model_name \
    --regions=$REGION \
    --labels=$labels
else:
    print("Model: {} already exists.".format(models[0]))

The reason for this to fail is because the following command

gcloud ai-platform models list

generates the following output:

...@cloudshell:~ (my-project)$ gcloud ai-platform models list
Using endpoint [https://ml.googleapis.com/]
Listed 0 items.

The "Using endpoint…" string passes the filter and therefore the models variable is not none.

models == ['Using endpoint [https://ml.googleapis.com/]']

This cause the ELSE to be execute, therefore no resource is created.

Create model version

Section "Deploy the model to AI Platform Prediction", create model version.

A similar thing happens for creating the version:

model_version = 'v01'
filter = 'name:{}'.format(model_version)
versions = !(gcloud ai-platform versions list --model={model_name} --format='value(name)' --filter={filter})

if not versions:
  !gcloud ai-platform versions create {model_version} \
    --model={model_name} \
    --origin=$JOB_DIR \
    --runtime-version=1.15 \
    --framework=scikit-learn \
    --python-version=3.7
else:
    print("Model version: {} already exists.".format(versions[0]))

The version is not created either.

kylgoog commented 4 years ago

Adding 2>/dev/null seems to help but would like to see if there are better workarounds

### Under "Create a model resource"
models = !(gcloud ai-platform models list --filter={filter} --format='value(name)' 2>/dev/null )

### Under "Create a model version"
versions = !(gcloud ai-platform versions list --model={model_name} --format='value(name)' --filter={filter} 2>/dev/null )
makquel commented 3 years ago

@pbavinck did you find a way workaround this issue? I've been following the quick-start to make a deploy script:

set -v

# This has to be run after train-cloud.sh is successfully executed

export MODEL_VERSION=v1
export REGION=us-east1

FRAMEWORK=tensorflow
MODEL_NAME=SegModel
MODEL_DIR=gs://model_storage_test/keras-job-dir/keras_export

echo "First, creating the model resource..."
gcloud ai-platform models create ${MODEL_NAME} --regions=${REGION}

echo "Second, creating the model version..."
gcloud ai-platform versions create ${MODEL_VERSION} \
  --model ${MODEL_NAME} \
  --origin ${MODEL_DIR} \
  --framework ${FRAMEWORK} \
  --runtime-version=${RUNTIME_VERSION} \
  --python-version=${PYTHON_VERSION} \
  --region=${REGION}

set -

However, it keeps complaining that the model resource was not found either

vanAkim commented 2 years ago

To me the problem was that the parameter --region wasn't properly "set" (or dunno what is going on in the background) to the region I requested.

The following command gcloud ai-platform models create ${MODEL_NAME} --region=${REGION} leads to the model creation with :

I'm not sure what's going on here and why, but doing --region=global resolve my versions create issue (same as yours) even if my model is on the region I asked (? maybe since the Cloud console is confirming it).