GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.52k stars 859 forks source link

gcloud ai-platform fails with TFv2 Saved Model #420

Closed ehennis closed 4 years ago

ehennis commented 5 years ago

Describe the bug A clear and concise description of what the bug is. Be sure to convey here whether it occurred locally or on the server (AI Platform, Google Dataflow)

I am following this tutorial: https://cloud.google.com/ml-engine/docs/tensorflow/deploying-models and using a model I created in TFv2. I am able to create the Saved_Model in my Colab using the following code:

import time
saved_model_path = "/content/gdrive/My Drive/Colab Notebooks/{}".format(int(time.time()))
tf.keras.experimental.export_saved_model(restored_model, saved_model_path)

This created a folder with the PB file and assets and variables folder. I then upload that to my bucket in the Google Cloud. I then ran the command to predict: gcloud ai-platform local predict --model-dir=$MODEL_DIR --text-instances ci.txt --framework TENSORFLOW

The ci.txt is a comma separated list of my input numbers. I get the following error that shows my values:

Traceback (most recent call last): File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 184, in main() File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 179, in main signature_name=args.signature_name) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_lib.py", line 102, in local_predict predictions = model.predict(instances, signature_name=signature_name) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_utils.py", line 268, in predict preprocessed, stats=stats, **kwargs) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py", line 363, in predict "Exception during running the graph: " + str(e)) cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3 (Error code: 2)

What sample is this bug related to? Not sure how to answer this question. I listed the tutorial I followed above.

Source code / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Here is the model: ver1.zip

To Reproduce Steps to reproduce the behavior:

  1. Train a model using TFv2 and Keras
  2. Export using tf.keras.experimental.export_saved_model(..)
  3. Upload to the AI-Engine
  4. See error

Expected behavior A clear and concise description of what you expected to happen. I expect to be able to use the system and call into the model for predictions.

System Information

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context Add any other context about the problem here.

Full Error

ERROR: (gcloud.ai-platform.local.predict) 2019-05-14 10:19:23.674836: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-05-14 10:19:23.687172: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-05-14 10:19:23.687418: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x556c01ae34a0 executing computations on platform Host. Devices: 2019-05-14 10:19:23.687444: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , WARNING:tensorflow:From /google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. WARNING:tensorflow:From /google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. WARNING:root:Error updating signature __saved_model_init_op: The name 'init_1' refers to an Operation, not a Tensor. Tensor names must be of the form ":". ERROR:root:Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3 Traceback (most recent call last): File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 184, in main() File "lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 179, in main signature_name=args.signature_name) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_lib.py", line 102, in local_predict predictions = model.predict(instances, signature_name=signature_name) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_utils.py", line 268, in predict preprocessed, stats=stats, **kwargs) File "/google/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/frameworks/tf_prediction_lib.py", line 363, in predict "Exception during running the graph: " + str(e)) cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during running the graph: invalid literal for float(): 81,71,80,76,1,3 (Error code: 2)

gogasca commented 5 years ago

We currently do not support TF 2.0 as part of predictions. These are the supported versions for AI Platform: https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list

I would suggest you use a Deep Learning VM image with TF serving.

gogasca commented 5 years ago

@ehennis I will keep this issue opened in order to provide an update as to when we will support this.

gogasca commented 5 years ago

Testing internally will update soon

helgaholmestad commented 5 years ago

Is there any update for when tf2 will be supported

gogasca commented 5 years ago

Once it goes General Availbility, closing for now

andrewferlitsch commented 4 years ago

Gonzalos meant to close this issue (see prior comment), but overlooked it. Closing it now.