google-coral / project-posenet

Human Pose Detection on EdgeTPU
Apache License 2.0
550 stars 156 forks source link

Is it possible to run the Custom OP on CPU with TFLite? #36

Closed ivelin closed 3 years ago

ivelin commented 4 years ago

Coral team, thank you for the great example.

We are working on an open source project that allows users to take advantage of EdgeTPU when available. Otherwise inference falls back to the CPU.

The README for this PoseNet example states that the CustomOP is embedded in the graph itself. Does that mean EdgeTPU knows how to resolve the CustomOp reference in the graph and execute it?

Trying to run the graph on TFLite on CPU without EdgeTPU produces the following error, which is a known limitation of TFLite:

def AllocateTensors(self):
>       return _interpreter_wrapper.InterpreterWrapper_AllocateTensors(self)
E       RuntimeError: Encountered unresolved custom op: PosenetDecoderOp.Node number 32 (PosenetDecoderOp) failed to prepare.

Is there a way to help TFLite resolve the CustomOp reference in the graph or that's an EdgeTPU feature only?

Looks like one way to inform TFLite of custom ops is to rebuild it from source. However that requires the CustomOp implementation to be available at build time.

Any guidance would be appreciated.

Thank you,

Ivelin

jwoolston commented 4 years ago

@ivelin I have gone through precisely this a while back and am trying to update the implementation currently. I will do my best to provide what I did.

Does that mean EdgeTPU knows how to resolve the CustomOp reference in the graph and execute it?

Not quite, at least not as far as I can tell. It appears to be provided by the EdgeTPU library.

Is there a way to help TFLite resolve the CustomOp reference in the graph or that's an EdgeTPU feature only?

I was previously unable to find a way.

Looks like one way to inform TFLite of custom ops is to rebuild it from source. However that requires the CustomOp implementation to be available at build time.

Correct. Fortunately, it is. As I mentioned above, it appears to be part of the Edge TPU library. I simply build this code into my binary. You can find the op here: https://github.com/google-coral/edgetpu/blob/master/src/cpp/posenet/posenet_decoder_op.cc

Related code that is likely to be needed as well is available there as well. I have successfully used this in CPU only on Linux. I am working on a build for Windows. I am not fond of the Bazel build system, particularly because for whatever reason tensorflow does not use up to date versions of Bazel and building on Windows is not easy. Frankly neither is building on Linux but there is at least the Docker image available there.

ivelin commented 4 years ago

@jwoolston thank you for sharing your experience so far. It will help with our fall detection PR.

And thanks for the pointer to the customop source. I was hoping we can avoid building tflite from scratch and can just reference the customop binary. Just like you said, building with Bazel is an adventure. I've contributed to TFIO some time ago and went through the whole immersive build experience :)

Maybe we can get some guidance from the TFLite team on a way to implement runtime customop resolution.

In the meanwhile I will keep an eye for your updates. You may also have already noticed these [1 , 2] alternative implementations that don't use customop but are probably not as computationally efficient.

jwoolston commented 4 years ago

I had not seen those. In my case I specifically need access via the Java API (either full TF or Lite). Since I find the bazel build of tensorflow to basically be impossible outside there docker image, I am working on a minimal CMake project that can build it as part of my JNI library. I am however extremely interested in option 1 you presented as I spent a lot of time the first time around trying to find a way to use the JS models since most of the published hub models are for JS. My understanding was that there was no tool to do that but it seems based on that repo that assessment was either incorrect or outdated. Have you had any success there?

jwoolston commented 4 years ago

@ivelin I should add, I dont believe computational efficiency has anything to do with it. The custom op in question here is decoding the output tensor of the network into the "pose" data. For whatever reason the edge TPU version defines an operation for it while the Android TFLite, Python and JS versions of the models I found all seem to output the raw data directly to be interpreted by the consumer. I presume you are familiar with https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5. The custom op is essentially responsible for parsing the network output and doing the heatmap/offset/vector calculations to find the keypoints. Note that some of the terminology varies between implementations but the previously linked C++ implementation appears to follow the paper's nomenclature.

ivelin commented 4 years ago

@jwoolston

Have you had any success there? Not yet. Still working on adapting the code to tflite runtime.

computational efficiency

You may be correct. I was referring to @rwightman 's comments in his readme that decoding speed impacts significantly (10x) overall performance. He says:

I first adapted the JS code more or less verbatim and found the performance was low so made some vectorized numpy/scipy version of a few key functions (named _fast).

The base MobileNet models have a throughput of 200-300 fps on a GTX 1080 Ti (or better) The multi-pose post processing code brings this rate down significantly. With a fast CPU and a GTX 1080+: A literal translation of the JS post processing code dropped performance to approx 30fps My 'fast' post processing results in 90-110fps

jwoolston commented 4 years ago

@ivelin Thanks for the info. Unfortunately for me the Java API's method of resolving a custom op requires it to be implemented in native code. The interface literally defines a method to get a native object. I suppose since TF is C++ at the end of the day, regardless of the API used, that this is unavoidable. If you are able to adapt that to tflite It might provide a way to avoid the C++ because it would be invoking the built in ops directly while also using the JS model which brings some other benefits as well. For that matter, the whole process opens up a new avenue that I had presumed was closed. Please let me know how the porting goes.

lupitia1 commented 3 years ago

@ivelin I have gone through precisely this a while back and am trying to update the implementation currently. I will do my best to provide what I did.

Does that mean EdgeTPU knows how to resolve the CustomOp reference in the graph and execute it?

Not quite, at least not as far as I can tell. It appears to be provided by the EdgeTPU library.

Is there a way to help TFLite resolve the CustomOp reference in the graph or that's an EdgeTPU feature only?

I was previously unable to find a way.

Looks like one way to inform TFLite of custom ops is to rebuild it from source. However that requires the CustomOp implementation to be available at build time.

Correct. Fortunately, it is. As I mentioned above, it appears to be part of the Edge TPU library. I simply build this code into my binary. You can find the op here: https://github.com/google-coral/edgetpu/blob/master/src/cpp/posenet/posenet_decoder_op.cc

Related code that is likely to be needed as well is available there as well. I have successfully used this in CPU only on Linux. I am working on a build for Windows. I am not fond of the Bazel build system, particularly because for whatever reason tensorflow does not use up to date versions of Bazel and building on Windows is not easy. Frankly neither is building on Linux but there is at least the Docker image available there.

Hello jwoolston! I'm trying to do the same right now but I'm facing the next problem:

AttributeError: 'Delegate' object has no attribute '_library' (decoder) [ec2-user@ip-mobilenet]$ vim run_model.py (decoder) [ec2-user@ip-mobilenet]$ python run_model.py }Traceback (most recent call last): File "run_model.py", line 3, in <module> tpu= tflite.load_delegate('libedgetpu.so.1') File "/home/ec2-user/decoder/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 166, in load_delegate delegate = Delegate(library, options) File "/home/ec2-user/decoder/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 90, in __init__ self._library = ctypes.pydll.LoadLibrary(library) File "/usr/local/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary return self._dlltype(name) File "/usr/local/lib/python3.7/ctypes/__init__.py", line 364, in __init__ self._handle = _dlopen(self._name, mode) OSError: libedgetpu.so.1: cannot open shared object file: No such file or directory Exception ignored in: <function Delegate.__del__ at 0x7fa6194d70e0> Traceback (most recent call last): File "/home/ec2-user/decoder/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 125, in __del__ if self._library is not None: AttributeError: 'Delegate' object has no attribute '_library'

When executing this code: `import tflite_runtime.interpreter as tflite

tpu= tflite.load_delegate('../posenet-tflite-convert/data/edgetpu/libedgetpu/direct/aarch64/libedgetpu.so.1')

posenet = tflite.load_delegate('posenet_decoder.so')

interpreter = tflite.Interpreter(model_path='posenet_mobilenet_v1_075_481_641_quant_decoder.tflite') interpreter.allocate_tensors()`

I used this docker to generate the .so libraries required: https://github.com/muka/posenet-tflite-convert

But I'm not sure how I have to connect the libraries with my code or the tflite_runtime.

Please, could you help me to have a better vision of this problem or tell me more about the solution you used?

I'm working on Amazon Linux 2.

Greetings!

hjonnala commented 3 years ago

please check this snippet to run PosenetDecoderOp on CPU with TFlite.

Thanks

google-coral-bot[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No