Closed Onay closed 3 years ago
There is a pybind
layer between pycoral
and libcoral
. For example, InvokeWirhMemBuffer
is defined here: https://github.com/google-coral/pycoral/blob/276d0d693f752635c2042ecd5d6a4e348fa03b3f/src/coral_wrapper.cc#L211
The first argument must be tflite_runtime.Interpreter
instance, the second is a raw buffer pointer (integer), and the third one is a buffer size. Looks like there is a way to get raw buffer pointer from numpy
array: https://stackoverflow.com/questions/11264838/how-to-get-the-memory-address-of-a-numpy-array-for-c
Thanks for your response. I installed the frogfish pip wheel of PyCoral from here:
Is it possible that the pybind layer is not installed properly when installing the wheel? Based on the error messages in my original post, it seems that the pybind thinks that arg0 is supposed to be of type 'object' and not a tflite interpreter pointer (the first argument according to the c++ reference).
Do you suggest I modify the code in pycoral.utils.edgetpu.run_inference() to replace the line interpreter_handle = interpreter._native_handle()
with something else?
EDIT: It looks like coral_wrapper.cc requires the first argument to be of type object
, in line with the error message I'm getting. Still, passing interpreter._native_handle()
causes this issue. Is there a way to type cast the integer value to a type 'object' in python (i.e., create an 'object' that points to the same memory address as the interpreter)? To my knowledge, this isn't possible in python. If that's the case, then it seems that the pybind wrapper definition would need to be updated to replace py::object interpreter_handle
with intptr_t interpreter_handle
EDIT2: For reference, here's the ._native_handle()
function in tf.lite:
# Experimental and subject to change.
def _native_handle(self):
"""Returns a pointer to the underlying tflite::Interpreter instance.
This allows extending tflite.Interpreter's functionality in a custom C++
function. Consider how that may work in a custom pybind wrapper:
m.def("SomeNewFeature", ([](py::object handle) {
auto* interpreter =
reinterpret_cast<tflite::Interpreter*>(handle.cast<intptr_t>());
...
}))
and corresponding Python call:
SomeNewFeature(interpreter.native_handle())
Note: This approach is fragile. Users must guarantee the C++ extension build
is consistent with the tflite.Interpreter's underlying C++ build.
"""
return self._interpreter.interpreter()
It looks like the PyCoral pybind wrapper is consistent with this comment. Nevertheless, the type of interpreter._native_handle()
is an int
, which leads to the pybind error that arg0 is supposed to be of type object
.
Sorry for confusion, the first argument should be interpreter._native_handle()
as it's defined in run_inference()
. Can you please provide python code snippet which generates the error?
Sure. Hopefully this will be enough to diagnose the problem:
from pycoral.utils import edgetpu, dataset
from pycoral.adapters import common, detect
...
class Detector(object):
def __init__(self, model_path):
self.interpreter = edgetpu.make_interpreter(model_path)
...
# this is the function that I run in a separate thread
def execute_inference(self):
input_data = self.cam.frame_flat.copy() # np array pre-formatted to model input_size
edgetpu.run_inference(self.interpreter, input_data) # <- this causes the TypeError: incompatible function arguments
EDIT: I decided to perform run_inference
in the main thread, rather than in a separate thread, and it appears to fix the problem. Perhaps it has something to do with the thread being unable to access a memory address of an interpreter created in a main thread, which is not accessible in the separate thread?
EDIT2: I decided to call interpreter._native_handle()
in the main thread and in a separate thread, and both returned the same integer (pointer) value. So it seems that the interpreter object is passed to the child thread "by reference." Based on this, I can conclude one of two things:
interpreter._native_handle()
is not the physical memory address, but rather a virtual memory address. If the child thread uses a different virtual memory bank, referring to the _native_handle() value will point to some other place in memory that does not have a tflite interpreter stored there.interpreter._native_handle()
is locked and therefore not accessible by the child thread, preventing the pybind casting the py::object pointer to a tflite interpreter from completing and throwing the error.I will continue to investigate and hopefully determine how to make this work. The whole reason I want to use run_inference
is to have it execute the i/o-bound operation in a separate thread and perform other tasks while waiting for the TPU inference to complete. It sort of defeats the purpose if I can't really use run_inference()
in any thread other than the main thread.
The only other solution I could explore is instantiating the tflite interpreter in the child thread, and keeping that thread alive. If that works, it's almost certainly a memory lock on the tflite interpreter that's causing the issue.
EDIT3: I tried instantiating everything (the Detector object, tflite interpreter, etc.) in its own thread and executing run_inference
on that thread. To my surprise, I am getting the TypeError again during InvokeWithMemBuffer()! Notably, interpreter._native_handle()
returns a negative integer value, which seems problematic as a memory address (unless it's just a signed/unsigned conversion error). Regardless, it appears that instantiating the tflite interpreter and then executing edgetpu.run_inference()
all in a separate thread still causes the issue. Clearly, there's some issue with python's threading
module that causes interpreter._native_handle()
when executed in that thread to produce an incorrect memory address.
There is a small test program which uses run_inference
function and uses threading
module. Could you try to run it locally and see how that goes? It would be nice to have an easy way to reproduce your problem.
You'll need to download test data first:
wget https://github.com/google-coral/edgetpu/raw/master/test_data/parrot.jpg
wget https://github.com/google-coral/edgetpu/raw/master/test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite
and then run:
import numpy as np
import platform
import tflite_runtime.interpreter as tflite
import time
from threading import Thread
from PIL import Image
from pycoral.utils import edgetpu
EDGETPU_SHARED_LIB = {
'Linux': 'libedgetpu.so.1',
'Darwin': 'libedgetpu.1.dylib',
'Windows': 'edgetpu.dll'
}[platform.system()]
def make_interpreter(model_file):
model_file, *device = model_file.split('@')
return tflite.Interpreter(
model_path=model_file,
experimental_delegates=[
tflite.load_delegate(EDGETPU_SHARED_LIB,
{'device': device[0]} if device else {})
])
def run():
interpreter = make_interpreter('mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite')
print('native_handle =', interpreter._native_handle())
interpreter.allocate_tensors()
image = Image.open("parrot.jpg").convert('RGB').resize((224, 224), Image.ANTIALIAS)
arr = np.array(image).flatten()
for _ in range(5):
start = time.perf_counter()
edgetpu.run_inference(interpreter, arr)
inference_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, inference_time * 1000))
def main():
print('=> run')
run()
print('=> thread run')
t = Thread(target=run)
t.start()
t.join()
if __name__ == '__main__':
main()
It works fine on my Mac machine:
$ python3 classify_image.py
=> run
native_handle = 140685132204752
class 923, time: 13.48ms
class 923, time: 2.93ms
class 923, time: 3.20ms
class 923, time: 3.09ms
class 923, time: 2.99ms
=> thread run
native_handle = 140684060308544
class 923, time: 12.79ms
class 923, time: 2.66ms
class 923, time: 2.75ms
class 923, time: 2.76ms
class 923, time: 2.80ms
Thanks. I ran your code. It fails on my Raspberry Pi:
=> run
native_handle = 23458760
class 923, time: 19.74ms
class 923, time: 4.99ms
class 923, time: 5.04ms
class 923, time: 4.92ms
class 923, time: 4.89ms
=> thread run
native_handle = -1302319328
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "test.py", line 35, in run
edgetpu.run_inference(interpreter, arr)
File "/usr/local/lib/python3.7/dist-packages/pycoral/utils/edgetpu.py", line 193, in run_inference
expected_input_size)
TypeError: InvokeWithMemBuffer(): incompatible function arguments. The following argument types are supported:
1. (arg0: object, arg1: int, arg2: int) -> None
Invoked with: -1302319328, 2993337976, 150528
EDIT: On a separate note, I was able to get run_inference
working in a thread by calling the library directly:
t = Thread(target=edgetpu.run_inference, args=(self.detector, input_data), daemon=True)
before = time.perf_counter()
t.start()
after = time.perf_counter() - before # ~50ms, so t must be blocking
t.join()
done = time.perf_counter() - after # < 0.1ms, so thread must be complete after t.start()
However, t.start()
is a blocking call. With the pretrained MobileNetV2 SSD, after
equals about 50 ms, and done
is under 0.1ms. Very strange -- it doesn't appear that edgetpu.run_inference
is treated as an i/o operation.
EDIT2: For more context, here are the following packages I have installed:
EDIT3: I modified your code so that it technically works, but as I mentioned before, the thread executing edgetpu.run_inference()
is blocking.
import numpy as np
import platform
import tflite_runtime.interpreter as tflite
import time
from threading import Thread
from PIL import Image
from pycoral.utils import edgetpu
EDGETPU_SHARED_LIB = {
'Linux': 'libedgetpu.so.1',
'Darwin': 'libedgetpu.1.dylib',
'Windows': 'edgetpu.dll'
}[platform.system()]
def make_interpreter(model_file):
model_file, *device = model_file.split('@')
return tflite.Interpreter(
model_path=model_file,
experimental_delegates=[
tflite.load_delegate(EDGETPU_SHARED_LIB,
{'device': device[0]} if device else {})
])
def run(interpreter, interpreter_handle):
for _ in range(5):
start = time.perf_counter()
edgetpu.run_inference(interpreter, arr, interpreter_handle)
inference_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, inference_time * 1000))
def main():
interpreter = make_interpreter('mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite')
interpreter_handle = interpreter._native_handle()
print('native_handle in main() =', interpreter._native_handle())
interpreter.allocate_tensors()
image = Image.open("parrot.jpg").convert('RGB').resize((224, 224), Image.ANTIALIAS)
arr = np.array(image).flatten()
print('=> run')
for _ in range(5):
start = time.perf_counter()
edgetpu.run_inference(interpreter, arr, interpreter_handle)
inference_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, inference_time * 1000))
print('\n=> thread run')
for _ in range(5):
start = time.perf_counter()
t = Thread(target=edgetpu.run_inference, args=(interpreter, arr), daemon=True)
t.start()
thread_start_time = time.perf_counter() - start
print(f"t.start() took {thread_start_time*1000:.2f}")
t.join()
thread_join_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, thread_join_time * 1000))
if __name__ == '__main__':
main()
Output:
native_handle in main() = 10590288
=> run
class 923, time: 19.78ms
class 923, time: 4.77ms
class 923, time: 4.71ms
class 923, time: 4.81ms
class 923, time: 4.72ms
=> thread run
t.start() took 5.45
class 923, time: 5.54ms
t.start() took 5.07
class 923, time: 5.16ms
t.start() took 5.06
class 923, time: 5.14ms
t.start() took 5.12
class 923, time: 5.20ms
t.start() took 5.02
class 923, time: 5.10ms
@dmitriykovalev do you have any other recommendations? Given that it works on your Macbook (presumably x86) but not on the Raspberry Pi 4 (armv7), I'm wondering if the issue has something to do with the tflite_runtime
library for armv7
Sorry for the delay. The culprit of TypeError: InvokeWithMemBuffer(): incompatible function arguments
is a bug inside coral_wrapper.cc: uintptr_t
should be used instead of intptr_t
in InvokeWithMemBuffer().
We already have the fix locally but have not updated GitHub yet. Fixed code looks like:
m.def("InvokeWithMemBuffer",
[](py::object interpreter_handle, uintptr_t buffer, size_t size) { // uintptr_t instead of intptr_t
...
}
Attaching the updated _pywrap_coral.cpython-37m-arm-linux-gnueabihf.so
as zip archive, so you can try locally on the Pi. The way to find its location:
$ python3
Python 3.7.3 (default, Jul 25 2020, 13:03:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pycoral.pybind import _pywrap_coral
>>> _pywrap_coral.__file__
'/home/pi/.local/lib/python3.7/site-packages/pycoral/pybind/_pywrap_coral.cpython-37m-arm-linux-gnueabihf.so'
Please decompress the attached archive and then replace the .so file on the Pi.
No problem. Thanks for uploading the fix. I replaced the _pywrap_coral.cpython-37m-arm-linux-gnueabihf.so
library per your instructions and executed the sample program you provided again. Unfortunately, I'm getting the following ImportError
:
Traceback (most recent call last):
File "test.py", line 8, in <module>
from pycoral.utils import edgetpu
File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 24, in <module>
from pycoral.pybind._pywrap_coral import GetRuntimeVersion as get_runtime_version
ImportError: /usr/lib/python3/dist-packages/pycoral/pybind/_pywrap_coral.cpython-37m-arm-linux-gnueabihf.so: invalid ELF header
I suspect this is because the .so
binary was built in Docker for aarch64/arm64 or x86_64, and not for armv7a/armhf, but I'm not entirely sure.
Interesting, I've taken it directly from my Raspberry Pi board. Can you please verify that you have the same md5sum
on the file:
$ md5sum /home/pi/.local/lib/python3.7/site-packages/pycoral/pybind/_pywrap_coral.cpython-37m-arm-linux-gnueabihf.so
00326d471b5c00cf2135be9c50678ad2 /home/pi/.local/lib/python3.7/site-packages/pycoral/pybind/_pywrap_coral.cpython-37m-arm-linux-gnueabihf.so
Just in case, make sure to uncompress the attached .zip archive first (the .so is inside it).
Ah, I read your comment too quickly and downloaded the zip file using wget -O
and renamed it to the .so file without unzipping it first. Please disregard my previous message. I just ran your test again and it works now! Thank you so much for the fix!
You are welcome! Thank you very much for discovering this problem. We'll close this issue right after the new wheels become published.
EDIT: While the _pywrap_coral
appears to have fixed the error, edgetpu.run_inference
still seems to be blocking when executed in a separate thread. Here's an example program and the output I'm getting:
import numpy as np
import platform
import tflite_runtime.interpreter as tflite
import time
from threading import Thread
from PIL import Image
from pycoral.utils import edgetpu
EDGETPU_SHARED_LIB = {
'Linux': 'libedgetpu.so.1',
'Darwin': 'libedgetpu.1.dylib',
'Windows': 'edgetpu.dll'
}[platform.system()]
def make_interpreter(model_file):
model_file, *device = model_file.split('@')
return tflite.Interpreter(
model_path=model_file,
experimental_delegates=[
tflite.load_delegate(EDGETPU_SHARED_LIB,
{'device': device[0]} if device else {})
])
def run(interpreter, interpreter_handle):
for _ in range(5):
start = time.perf_counter()
edgetpu.run_inference(interpreter, arr)
inference_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, inference_time * 1000))
def main():
interpreter = edgetpu.make_interpreter('mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite', device="usb:0")
interpreter_handle = interpreter._native_handle()
print('native_handle in main() =', interpreter._native_handle())
interpreter.allocate_tensors()
image = Image.open("parrot.jpg").convert('RGB').resize((224, 224), Image.ANTIALIAS)
arr = np.array(image).flatten()
print('=> run')
for _ in range(5):
start = time.perf_counter()
edgetpu.run_inference(interpreter, arr)
inference_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, inference_time * 1000))
print('\n=> thread run')
for _ in range(5):
start = time.perf_counter()
t = Thread(target=edgetpu.run_inference, args=(interpreter, arr), daemon=True)
t.start() # <== this is blocking!
thread_start_time = time.perf_counter() - start
print(f"t.start() took {thread_start_time*1000:.2f}ms")
t.join()
thread_end_time = time.perf_counter() - start
print(f"t.join() took {thread_end_time*1000:.2f}ms")
thread_join_time = time.perf_counter() - start
output_details = interpreter.get_output_details()[0]
klass = np.argmax(np.squeeze(interpreter.tensor(output_details['index'])()))
print('class %s, time: %.2fms' % (klass, thread_join_time * 1000))
if __name__ == '__main__':
main()
When executed on the Raspberry Pi 4B, I get the following output:
native_handle in main() = 22730320
=> run
class 923, time: 18.25ms
class 923, time: 3.27ms
class 923, time: 3.16ms
class 923, time: 3.17ms
class 923, time: 3.18ms
=> thread run
t.start() took 3.90ms
t.join() took 3.99ms
class 923, time: 4.02ms
t.start() took 4.06ms
t.join() took 4.14ms
class 923, time: 4.16ms
t.start() took 3.71ms
t.join() took 3.81ms
class 923, time: 3.84ms
t.start() took 3.70ms
t.join() took 3.78ms
class 923, time: 3.80ms
t.start() took 3.62ms
t.join() took 3.70ms
class 923, time: 3.72ms
Notice how the call t.start()
takes roughly the same amount of time (actually more!) to execute as the non-threaded inference, and t.join()
is very fast. Clearly python is still treating edgetpu.run_inference
as a CPU-bound operation and therefore the GIL causes the thread to be blocking until it completes.
I'm not sure if this warrants keeping the issue open or not.
That's a good point. I've added special handling for the GIL during the InvokeWithMemBuffer
call. Please try the updated _pywrap_coral.cpython-37m-arm-linux-gnueabihf.so binary. On my Pi board your code example prints
$ python3 test.py
native_handle in main() = 36249632
=> run
class 923, time: 16.05ms
class 923, time: 2.94ms
class 923, time: 2.89ms
class 923, time: 2.89ms
class 923, time: 2.87ms
=> thread run
t.start() took 0.75ms
t.join() took 3.67ms
class 923, time: 3.72ms
t.start() took 0.44ms
t.join() took 3.43ms
class 923, time: 3.48ms
t.start() took 0.41ms
t.join() took 3.26ms
class 923, time: 3.31ms
t.start() took 0.43ms
t.join() took 3.38ms
class 923, time: 3.44ms
t.start() took 0.41ms
t.join() took 3.26ms
class 923, time: 3.30ms
The latest version appears to work! I'm getting the same result as you with the new _pywrap_coral
binary. Many thanks for the fix!
Closing this issue, we've updated all wheels: https://github.com/google-coral/pycoral/releases/tag/v1.0.1
According to the documentation, PyCoral improves upon the Edge TPU Python API because it treats the Edge TPU operations as I/O-bound:
However, upon inspection of the code, calling interpreter.invoke() simply uses the existing tflite_runtime invoke() function. The only function that appears to be consistent with the documentation is
pycoral.utils.edgetpu.run_inference()
, which calls i/o-bounded functions (I presume) such asinvoke_with_membuffer
rather than the typicalinterpreter.invoke()
. These are C++ functions from Libcoral which are exposed to the PyCoral API using pybind (pycoral.pybind._pywrap_coral).There appears to be an error where the object types in python are misinterpreted by the C++ module. When I call run_inference() with the interpreter and a flattened numpy array as the input, I get the following error:
Note that the traceback shows that I'm calling run_inference in a separate thread. The reason for this is because I want to run the TPU inference in a separate thread, and do other processing while waiting for the output from the TPU.
As you can see from the output, InvokeWithMemBuffer() expects arg0 to be of type "object", although it is supposed to be the memory address of the interpreter (see the LibCoral C++ source).
I believe the int for arg0 (14635576) is the memory address of the interpreter because of the following line of code in run_inference():
So, it appears that there is a missing declaration in the wrapper that is causing InvokeWithMemBuffer() to think that the interpreter address memory int value is not the correct parameter type, even though it is.
I also attempted to see whether this problem persisted with the unit tests in edgetpu_utils_test.py, which also tests invoke_with_membuffer. The only error I received when running the error tests was the following:
I suspect that this error is caused by the same issue (a type-mismatch for arg0 of InvokeWithMemBuffer())
I'm running the code on a Raspberry Pi 4B (Buster, 32-bit, armv7a) and Coral USB accelerator. Any help would be appreciated!