google-coral / tflite

Examples using TensorFlow Lite API to run inference on Coral devices
https://coral.withgoogle.com
Apache License 2.0
182 stars 67 forks source link

ValueError: Didn't find custom op for name 'edgetpu-custom-op' with version 1 #2

Closed travisariggs closed 4 years ago

travisariggs commented 5 years ago

I am trying to run my first test of the USB Accelerator using the classify_image.py script and I'm getting an error trying to initialize the the Interpreter:

$ python3 classify_image.py \                                                                      
--model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \                                
--labels models/inat_bird_labels.txt \                                                              
--image images/parrot.jpg                                                                           
Initializing TF Lite interpreter...                                                                 
Traceback (most recent call last):                                                                  
  File "classify_image.py", line 118, in <module>                                                   
    main()                                                                                          
  File "classify_image.py", line 95, in main                                                        
    experimental_delegates=[load_delegate('libedgetpu.so.1.0')])                                    
  File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 206, in __init__
    model_path))                                                                                    
ValueError: Didn't find custom op for name 'edgetpu-custom-op' with version 1                       
Registration failed.                                                                                

I believe I have installed everything following this tutorial: https://coral.withgoogle.com/docs/accelerator/get-started/ including the installation of the runtime from https://www.tensorflow.org/lite/guide/python. I chose the tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl for my system.

Here are some system details: Ubuntu 16.04.6 LTS 64-bit python 3.5.2

mrharicot commented 5 years ago

I have the exact same issue with the same system: Ubuntu 16.04.6 LTS 64-bit python 3.5.2

Namburger commented 5 years ago

Hummn, I couldn't reproduce with python3.6 with tflite_runtime-1.14.0-cp36-cp36m-linux_x86_64.whl Looks like the problem occurs during load_delegate('libedgetpu.so.1.0'), could you confirm that it is installed? For reference:

$ ls -l /usr/lib/x86_64-linux-gnu/libedgetpu*
lrwxrwxrwx 1 root root   43 Oct  9 14:37 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1 -> /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
-rwxr-xr-x 1 root root 930K Oct  9 14:37 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
Namburger commented 5 years ago

Also, could you attach the output of strace?

mrharicot commented 5 years ago

I have verified that libedgetpu.so is installed, I followed the last blog post on coral Here is the strace output. strace.txt

Thanks for your help!

Namburger commented 5 years ago

@mrharicot are you sure you are getting the same error? From your strace.txt, I'm seeing this:

  File "classify_image.py", line 118, in <module>
    main()
  File "classify_image.py", line 95, in main
    experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
  File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 230, in __init__
    self._interpreter.ModifyGraphWithDelegate(
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 97, in <lambda>
    __getattr__ = lambda self, name: _swig_getattr(self, InterpreterWrapper, name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 74, in _swig_getattr
    return _swig_getattr_nondynamic(self, class_type, name, 0)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 69, in _swig_getattr_nondynamic
    return object.__getattr__(self, name)
AttributeError: type object 'object' has no attribute '__getattr__'
) = 1087

Which would actually be a different issue; in either case though, I cannot recreated even with python3.5 :(

mrharicot commented 5 years ago

My bad, this was without the non edge tpu model. Here is the strace for the edge tpu model: strace.txt

Namburger commented 5 years ago

@mrharicot @travisariggs Could you guys try the edgetpu api instead of the tflite_runtime api? https://github.com/google-coral/edgetpu/blob/master/examples/classify_image.py At the moment I'd like to pin point the issue down to see if it's a tflite problem or libedgetpu problem

mrharicot commented 5 years ago

@Namburger Nice catch! The edgetpu api seems to be working fine for both the tflite and the edgetpu models.


python3 classify_image.py --model ../test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --label ~/code/coral/tflite/python/examples/models/inat_bird_labels.txt --image ~/code/coral/tflite/python/examples/images/parrot.jpg

---------------------------
Ara macao (Scarlet Macaw)
Score :  0.6484375
---------------------------
Platycercus elegans (Crimson Rosella)
Score :  0.13671875```
Namburger commented 5 years ago

@mrharicot nice! So the issue looks to be with tflite_runtime rather than the libedgetpu, weirdly I cannot recreate the issue even with python3.5 or python3.6 (although I tested on 2 different host machine, which makes me wonder if Ubuntu 16.04 is the culpit). I'll file an internal bug on this one and will give you update if we can find something. Thanks for submitting the issue!

P.S. The difference between the 2 models *.tflite and the *edgetpu.tflite model is that one has been compiled for the TPU and it will be faster, while the other one will be slower since it uses CPU. https://coral.withgoogle.com/docs/edgetpu/compiler/ For reference:

edgetpu_compiled
----INFERENCE TIME---- 
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
11.6ms
3.1ms
3.0ms
2.6ms
2.5ms
non-edgetpu compiled
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
146.0ms
144.2ms
146.1ms
147.2ms
145.1ms
-------RESULTS--------
dmitriykovalev commented 5 years ago

According to your second trace file:

[pid 28150] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid 28150] open("/lib/x86_64-linux-gnu/libedgetpu.so.1.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Can you please try the original example with the full path to libedgetpu.so (please double check it exists):

load_delegate('/usr/lib/x86_64-linux-gnu/libedgetpu.so.1')
mrharicot commented 5 years ago

@dmitriykovalev I get the same error

mrazekv commented 5 years ago

I got the same issue with Ubuntu 19.04, python 3.7 and wheel package tflite_runtime-1.14.0-cp37-cp37m-linux_x86_64.whl .

The library was loaded correctly openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libedgetpu.so.1", O_RDONLY|O_CLOEXEC) = 3

The edgetpu API is working correctly, but tflite_runtime not.

Namburger commented 5 years ago

@mrazekv @mrharicot @travisariggs I wonder if this would be a problem with tflite api also. Could you try with with this script (this is just a modified version of the original script, using tflite instead of tflite_runtime):

import argparse
import time
import numpy as np
from PIL import Image
import tensorflow as tf
print("TF VERSION: ", tf.__version__)

def load_labels(filename):
  with open(filename, 'r') as f:
    return [line.strip() for line in f.readlines()]

def set_input_tensor(interpreter, image):
  tensor_index = interpreter.get_input_details()[0]['index']
  input_tensor = interpreter.tensor(tensor_index)()[0]
  input_tensor[:, :] = image

def classify_image(interpreter, image, top_k):
  set_input_tensor(interpreter, image)
  interpreter.invoke()
  output_details = interpreter.get_output_details()[0]
  output = np.squeeze(interpreter.get_tensor(output_details['index']))
  # If the model is quantized (uint8 data), then dequantize the results
  if output_details['dtype'] == np.uint8:
    scale, zero_point = output_details['quantization']
    output = scale * (output - zero_point)
  ordered_indices = output.argsort()[-top_k:][::-1]
  return [(i, output[i]) for i in ordered_indices]

def main():
  parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
  parser.add_argument(
      '--model', help='File path of .tflite file.', required=True)
  parser.add_argument(
      '--labels', help='File path of labels file.', required=True)
  parser.add_argument('--image', help='Image to be classified.', required=True)
  parser.add_argument(
      '--top_k', help='Number of classifications to list', type=int, default=1)
  parser.add_argument(
      '--count', help='Number of times to run inference', type=int, default=5)
  args = parser.parse_args()
  print('Initializing TF Lite interpreter...')
  interpreter = tf.compat.v2.lite.Interpreter(
      model_path=args.model,
      experimental_delegates=[tf.compat.v2.lite.experimental.load_delegate('libedgetpu.so.1.0')])
  interpreter.allocate_tensors()
  _, height, width, _ = interpreter.get_input_details()[0]['shape']
  image = Image.open(args.image).resize((width, height), Image.ANTIALIAS)
  print('----INFERENCE TIME----')
  print('Note: The first inference on Edge TPU is slow because it includes',
        'loading the model into Edge TPU memory.')
  for _ in range(args.count):
    start_time = time.monotonic()
    results = classify_image(interpreter, image, args.top_k)
    elapsed_ms = (time.monotonic() - start_time) * 1000
    print('%.1fms' % elapsed_ms)
  labels = load_labels(args.labels)
  print('-------RESULTS--------')
  for label_id, prob in results:
    print('%s: %.5f' % (labels[label_id], prob))

if __name__ == '__main__':
  main()
mrazekv commented 5 years ago

@Namburger thanks for a quick reply. I am using pre-build TF 1.14 (downloaded from the official release channel). However, the load_delegate function was not found.

python3 test.py --model models/mobilenet_v2_1.0_224_inat_bird_quant.tflite --labels models/inat_bird_labels.txt --image images/parrot.jpg
TF VERSION:  1.14.0
Initializing TF Lite interpreter...
Traceback (most recent call last):
  File "test.py", line 63, in <module>
    main()
  File "test.py", line 45, in main
    experimental_delegates=[tf.compat.v2.lite.experimental.load_delegate('libedgetpu.so.1.0')])
AttributeError: module 'tensorflow._api.v1.compat.v2.lite' has no attribute 'experimental'
Namburger commented 5 years ago

@mrazekv tensor flow and tflite_runtime versioning are a little different. I believe it required tfnightly v1.15 (tfnightly is the nightly build version of tensor flow) at minimum to get load_delegates. This is most likely why you don't have load_delegates with tf1.14. I'm actually using tf2.0 for this script, maybe try upgrading tf?

mrazekv commented 5 years ago

@Namburger I installed tf-nightly package using pip3 in a virtual environment and your script is going fine.

TF VERSION:  2.1.0-dev20191024
Initializing TF Lite interpreter...
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.9ms
4.6ms
4.3ms
4.4ms
4.8ms
-------RESULTS--------
923 Ara macao (Scarlet Macaw): 0.76562

Thank you very much for your help.

Namburger commented 5 years ago

@mrazekv No problems, thanks for helping me diagnose the issue. The only difference here is we're using tf's:

tf.compat.v2.lite.Interpreter
tf.compat.v2.lite.experimental.load_delegate

instead of

from tflite_runtime.interpreter import Interpreter
from tflite_runtime.interpreter import load_delegate

I believe tflite_runtime is the issue, but we're unable to reproduce this on our end o_0

wb666greene commented 4 years ago

@mrazekv tensor flow and tflite_runtime versioning are a little different. I believe it required tfnightly v1.15 (tfnightly is the nightly build version of tensor flow) at minimum to get load_delegates. This is most likely why you don't have load_delegates with tf1.14. I'm actually using tf2.0 for this script, maybe try upgrading tf?

Any chance of getting the instructions at: https://www.tensorflow.org/lite/guide/python fixed? I just followed the instructions and hit this issue when setting up the Coral TPU uisng these instructions: https://coral.withgoogle.com/docs/accelerator/get-started/

Very frustrating!

How exaclty do I install tf-nightly or 1.15, whatever it takes to fix it?

I tried: sudo -H pip3 install tf-nightly (after without the sudo -H failed) and now get different errors :(

tflite/python/examples/classification$ python3 classify_image.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --input images/parrot.jpg 2019-11-21 12:43:46.933849: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2019-11-21 12:43:46.933866: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "classify_image.py", line 118, in main() File "classify_image.py", line 95, in main interpreter = make_interpreter(args.model) File "classify_image.py", line 69, in make_interpreter {'device': device[0]} if device else {}) File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 206, in init model_path)) NotImplementedError: Wrong number or type of arguments for overloaded function 'InterpreterWrapper_CreateWrapperCPPFromFile'. Possible C/C++ prototypes are: tflite::interpreter_wrapper::InterpreterWrapper::CreateWrapperCPPFromFile(char const ,std::vector< std::string > const &,std::string ) tflite::interpreter_wrapper::InterpreterWrapper::tflite_interpreter_wrapper_InterpreterWrapper_CreateWrapperCPPFromFile__SWIG_1(char const ,PyObject )

Namburger commented 4 years ago

@wb666greene Can you share the output of:

python3 -c 'print(__import__("tensorflow").__version__)'
wb666greene commented 4 years ago

I just un-installed tf-nightly. But I get this:

$ python3 -c 'print(import("tensorflow").version)' Traceback (most recent call last): File "", line 1, in AttributeError: module 'tensorflow' has no attribute 'version'

The instructions say to install: pip3 install tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl

If I open python3 interactive both import tflite_runtime and import tensorflow succeed, but neither seem to have a version:

tflite_runtime.version Traceback (most recent call last): File "", line 1, in AttributeError: module 'tflite_runtime' has no attribute 'version'

I'm not planning to use the tflite runtime anytime soon, but I hate to leave an installation uncompleted.

A week ago I installed edgetpu_api_2.11.1.tar.gz to a different machine except for a sample I ran used run_inference (which seems to be a 2.11.2 thing) so I had to change it to RunInference to get it to work.

Starting fresh on a different machine using the https://coral.withgoogle.com/docs/accelerator/get-started/ instructions got me an apt Repo and apt-get installation of 2.11.2 and this mess trying to run the exampe at the bottom of the page.

My other sample (edgetpu_api) code runs fine on the new machine except for throwing tons of RunInterface is depreciated warnings ruining console output, but if I change back to run_inference all seems well.

What is the point of trivial name changes like this?

Words to live by that the computer industry seems hell-bent to ignore: "Different is not better, better is better. How is this change better?"

Namburger commented 4 years ago

@wb666greene So we've updated our runtime library, which has a tons of improvements. This is just normal software upgrades. However, it looks like that with the new runtime, tflite_runtime has not catch up and that's what causing this issue. The weird thing about this is that I'm not able to reproduce this on my side. This is why I suggested above to: a) either using the edgetpu API instead and the repo is provided here: https://github.com/google-coral/edgetpu b) use the full tensorflow lite API This is documented here: https://coral.withgoogle.com/docs/edgetpu/tflite-python/ With this, you'll have to install tensorflow 1.15 and up. The normal process for doing this is just

pip3 install tensorflow==1.15 --user

It looks to me that you're having installation issues for tensorflow

wb666greene commented 4 years ago

Now I get the 1.15 version of tennsorflow but the sample code fails: `tflite/python/examples/classification$ python3 classify_image.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --input images/parrot.jpg Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 165, in load_delegate delegate = Delegate(library, options) File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "classify_image.py", line 118, in main() File "classify_image.py", line 95, in main interpreter = make_interpreter(args.model) File "classify_image.py", line 69, in make_interpreter {'device': device[0]} if device else {}) File "/usr/local/lib/python3.5/dist-packages/tflite_runtime/interpreter.py", line 168, in load_delegate library, str(e))) ValueError: Failed to load delegate from libedgetpu.so.1 `

Namburger commented 4 years ago

@wb666greene Using the tflite API instead of tflite_runtime, you'll have to change the script to what I mentioned above I mentioned this to you also, but this tflite API is documented here: https://coral.withgoogle.com/docs/edgetpu/tflite-python/

P.S.

python3 test.py --model models/mobilenet_v2_1.0_224_inat_bird_quant.tflite --labels models/inat_bird_labels.txt --image images/parrot.jpg
TF VERSION:  1.15.0
Initializing TF Lite interpreter...
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
44.2ms
44.6ms
43.5ms
43.8ms
43.3ms
-------RESULTS--------
923 Ara macao (Scarlet Macaw): 0.77344
wb666greene commented 4 years ago

I cut and pasted your script and named it x.py, when I run it it errors:

tflite/python/examples/classification$ python3 x.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --image images/parrot.jpg TF VERSION: 1.15.0 Initializing TF Lite interpreter... Traceback (most recent call last): File "/home/wally/.local/lib/python3.5/site-packages/tensorflow_core/lite/python/interpreter.py", line 165, in load_delegate delegate = Delegate(library, options) File "/home/wally/.local/lib/python3.5/site-packages/tensorflow_core/lite/python/interpreter.py", line 119, in init raise ValueError(capture.message) ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "x.py", line 63, in main() File "x.py", line 45, in main experimental_delegates=[tf.compat.v2.lite.experimental.load_delegate('libedgetpu.so.1.0')]) File "/home/wally/.local/lib/python3.5/site-packages/tensorflow_core/lite/python/interpreter.py", line 168, in load_delegate library, str(e))) ValueError: Failed to load delegate from libedgetpu.so.1.0

I think part of the problem is there are multiple versions of these example scripts floating around with the same name. I had to change --input (as in the webpage command figure 1) to --image on the command line for this.

If the problem is really something about libedgetpu.so.1.0 then the problem may be "upstream" in the sudo apt-get install libedgetpu1-max previous step.

My /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0 has modified date: Monday 16 Sep 2019 03:27:18 PM CDT

Namburger commented 4 years ago

@wb666greene lol I had this issue before and I thought it was a tensorflow issue (turned out I didn't have my accelerator plugged in ¯_(ツ)_/¯ ). Other reasons are your user not in plugdev group or usb not detected on your system. Try this:

$ sudo usermod -aG plugdev [your username]

and reboot the system.

wb666greene commented 4 years ago

Other edgetpu_api TPU code is running fine with this system, so its not that. But I've certainly made this mistake multiple times in the past!

I managed to get the broken tflite_runtime removed and now your cut and pasted script runs:

tflite/python/examples/classification$ python3 x.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --image images/parrot.jpg TF VERSION: 1.15.0 Initializing TF Lite interpreter... ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 12.9ms 3.6ms 3.4ms 3.5ms 3.4ms -------RESULTS-------- 923 Ara macao (Scarlet Macaw): 0.78516

But I can't figure out how to change the downloaded tflite/python/examples/classification program to use tensorflow instead of tflite_runtime (the one that needs --input instead of --image).

Thanks for your help.

Namburger commented 4 years ago

@wb666greene woot woot! try replacing the classify_image.py from this repo with this: https://gist.github.com/Namburger/20788172fccf1ca0c9e13b7b14d1b70a

wb666greene commented 4 years ago

@Namburger Thanks, I merged the changes in your github with the downloaded example from the web-page instructions and it runs fine now.

I'd never would have figured out replacing: tflite.load_delegate() with: tf.compat.v2.lite.experimental.load_delegate()

How will I know if/when the tflite_runtime gets fixed for 2.11.2? or will it be when 2.11.3 comes out?

I have an "if it ain't broke don't fix it!" mentality, so I'd stayed with 1.92.2 until I tried Posenet which required 2.11.1 and then the problems started. I got Posenet working fine and then discovered that all my other TPU code was now broken.

Everything is back to working fine now.

Namburger commented 4 years ago

@wb666greene my apologies, I understand, it's really hard to keep up with older references of the same library when the documentation continues to change. Our library is still involving and we're trying to expands the scope of operations that we can support as mentioned here which is one of the main reason why we're coming out with new releases. We have an internal bug open to fix tflite_runtime, but I cannot commit a time frame for when that will be done. And tf.compat is a tensorflow feature that allows you to change tensorflow behavior to be compatible with different versions. It is advised to check it out or be aware of when working with tensorflow, the doc is here: https://www.tensorflow.org/api_docs/python/tf/compat

wb666greene commented 4 years ago

Nothing to apologize for. You are trying to support two extremes, experts that have used tensorflow before the "edge" AI co-processors were available and newbies like myself who are jumping in to see what value existing "public" models can add to systems.

My specific interest at the moment is to use "person detection" to push the false notification (alarm) rate towards zero for existing video security camera systems.

Using the TPU and MobilenetSSD-v2 with a detect, zoom in and re-detect algorthim on an i7-4500U "Mini PC" (<60W), I'm getting one false positive about every 10 million frames. With 15 outdoor cameras and the AI processing ~40 fps (bit under 3 fps per camera) this is one false notification every two or three days. Main issues is they tend to come in bursts which make the false notifications even more annoying.

My idea was to use Posenet as the verification. Worked great, rejecting every single false positive image I'd collected. Problem is, it greatly increases the false negative rate by rejecting 30-90% of the valid detections depending on camera angle -- high downward looking angles reject the most.

mazzingkaizer commented 4 years ago

In my case, the docker encountered ValueError: Didn't find custom op for name 'edgetpu-custom-op' with version 1 error.

  1. make docker container with following options sudo docker run -d -it --privileged -v /dev:/dev -v /etc/udev:/etc/udev

  2. in container, install tensorflow==2.0.0 because of "Didn't find custom op for name 'edgetpu-custom-op' with version 1 error."

pip install tensorflow==2.0.0

  1. enjoy ^^ python3 classify_image.py --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels models/inat_bird_labels.txt --input images/parrot.jpg INFO: Initialized TensorFlow Lite runtime. ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 11.5ms 2.6ms 2.5ms 2.4ms 2.6ms -------RESULTS-------- Ara macao (Scarlet Macaw): 0.76172

Finish !!! (^____^)