GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

Official AIP tutorial has a serious error "Could not find resource: localhost/dense/kernel error" #490

Closed myoshimu closed 2 years ago

myoshimu commented 3 years ago

Describe the bug This tutorial does not work properly. https://cloud.google.com/ai-platform/docs/getting-started-keras

Source code / logs https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/notebooks/tensorflow/getting-started-keras.ipynb

To Reproduce Steps to reproduce the behavior:

  1. Open the tutorial in Colaboratory
  2. Run all the cells.
  3. You will see an error when creating the model version at "Create model and version resources in AI Platform". I fixed the with the code below:
    
    MODEL_VERSION = "v1"

Get a list of directories in the keras_export parent directory

KERAS_EXPORT_DIRS = ! gsutil ls $JOB_DIR/keras_export/

Pick the directory with the latest timestamp, in case you've trained

multiple times

SAVED_MODEL_PATH = KERAS_EXPORT_DIRS[-1]

! gsutil cp $JOB_DIR/keras_export/saved_model.pb $JOB_DIR/keras_export/variables/

Create model version based on that SavedModel directory

! gcloud ai-platform versions create $MODEL_VERSION \ --model $MODEL_NAME \ --runtime-version 1.15 \ --python-version 3.7 \ --framework tensorflow \ --origin $SAVED_MODEL_PATH

4. See error when trying "Submit the online prediction request"

gcloud ai-platform predict \ --model $MODEL_NAME \ --version $MODEL_VERSION \ --json-instances prediction_input.json

Using endpoint [https://ml.googleapis.com/] { "error": "Prediction failed: Error during model execution: AbortionError(code=StatusCode.FAILED_PRECONDITION, details=\"Error while reading resource variable dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense/kernel)\n\t [[{{node dense/MatMul/ReadVariableOp}}]]\")" }


**Expected behavior**
The predicted result is shown properly.
myoshimu commented 3 years ago

The issue shoudl be fixed with: SAVED_MODEL_PATH=keras_export