GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

Problem with gcloud local prediction. #457

Closed Mustufain closed 2 years ago

Mustufain commented 4 years ago

Describe the bug I am using gcloud local prediction to test my exported model. The model is a tensorflow object detection model which has been trained on custom dataset. I am using the following gcloud command:

gcloud ml-engine local predict --model-dir=/path/to/saved_model/ --json-instances=input.json --signature-name="serving_default" --verbosity debug

Source code / logs

DEBUG: [Errno 32] Broken pipe Traceback (most recent call last): File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 984, in Execute resources = calliope_command.Run(cli=self, args=args) File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 784, in Run resources = command_instance.Run(args) File "/google-cloud-sdk/lib/surface/ai_platform/local/predict.py", line 83, in Run signature_name=args.signature_name) File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/local_utils.py", line 103, in RunPredict proc.stdin.write((json.dumps(instance) + '\n').encode('utf-8')) IOError: [Errno 32] Broken pipe

Details of my exported model :

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['inputs'] tensor_info: dtype: DT_STRING shape: (-1) name: encoded_image_string_tensor:0 The given SavedModel SignatureDef contains the following output(s): outputs['detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: detection_boxes:0 outputs['detection_classes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_classes:0 outputs['detection_features'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, -1, -1) name: detection_features:0 outputs['detection_multiclass_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: detection_multiclass_scores:0 outputs['detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_scores:0 outputs['num_detections'] tensor_info: dtype: DT_FLOAT shape: (-1) name: num_detections:0 outputs['raw_detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: raw_detection_boxes:0 outputs['raw_detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: raw_detection_scores:0 Method name is: tensorflow/serving/predict

Expected behavior The input file is in the following format : {"inputs": {"b64": "/9j/4AAQSkZJRgABAQAAAQABAAD/......}} Base on the above input file I am not getting predictions out of it.

System Information

andrewferlitsch commented 4 years ago

@dizcology please take a look

gogasca commented 4 years ago

@Mustufain is this a model in our samples? Take a look at: https://stackoverflow.com/questions/49172710/what-does-google-cloud-ml-engine-do-when-a-json-request-contains-bytes-or-b6 if not a model please open issue in StackOverflow for visibility of the community

Grandmother commented 4 years ago

I've met the same problem and found out that ml_engine/local_utils.py uses python to run ml_engine/local_predict.pyc that is built for python2.7. My python is python3, so when ml_engine/local_utils.py tries to run ml_engine/local_predict.pyc using python (actually python3), it fails with error:

RuntimeError: Bad magic number in .pyc file

Solution 1:

You can just make python2 as default one in system.

Solution 2:

I changed ml_engine/local_utils.py with such patch:

83c83
<   python_executables = files.SearchForExecutableOnPath("python")
---
>   python_executables = files.SearchForExecutableOnPath("python2")
114a115
>   log.debug(args)
124,126c125,130
<   for instance in instances:
<     proc.stdin.write((json.dumps(instance) + "\n").encode("utf-8"))
<   proc.stdin.flush()
---
>   try:
>     for instance in instances:
>       proc.stdin.write((json.dumps(instance) + "\n").encode("utf-8"))
>     proc.stdin.flush()
>   except:
>     pass

* log.degub(args) needed to see what we are running. ** try-catch needed to make script able to read and print an error ocurred while running ml_engine/local_predict.pyc.

Grandmother commented 4 years ago

Here is a patch in unified format:

--- /usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/local_utils_orig.py      2020-05-14 10:18:40.389442573 +0300
+++ /usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/local_utils.py   2020-05-14 12:28:41.053351849 +0300
@@ -80,7 +80,7 @@
     encoding.SetEncodedValue(env, "CLOUDSDK_ROOT", sdk_root)
     # We want to use whatever the user's Python was, before the Cloud SDK started
     # changing the PATH. That's where Tensorflow is installed.
-    python_executables = files.SearchForExecutableOnPath("python")
+    python_executables = files.SearchForExecutableOnPath("python2")
     # Need to ensure that ml_sdk is in PYTHONPATH for the import in
     # local_predict to succeed.

@@ -112,6 +112,7 @@
         for a in ([python_executable, local_predict.__file__] + predict_args)
     ]

+    log.debug(args)
     proc = subprocess.Popen(
         args,
         stdin=subprocess.PIPE,
@@ -121,9 +122,12 @@
     )

     # Pass the instances to the process that actually runs local prediction.
-    for instance in instances:
-        proc.stdin.write((json.dumps(instance) + "\n").encode("utf-8"))
-    proc.stdin.flush()
+    try:
+        for instance in instances:
+            proc.stdin.write((json.dumps(instance) + "\n").encode("utf-8"))
+        proc.stdin.flush()
+    except:
+        pass

     # Get the results for the local prediction.
     output, err = proc.communicate()
kweinmeister commented 2 years ago

It looks like this issue is about the AI Platform itself, not sample code in this repo. The best path forward is to contact Google Cloud Support. I will go ahead and close this issue. Please feel free to reopen if you have any questions about the sample code in this repo.