Efinix-Inc / tinyml

This repo is for Efinix TinyML platform, which offers end-to-end flow that facilitates TinyML solution deployment on Efinix FPGAs.
MIT License
44 stars 12 forks source link

TFLite micro library conflict #6

Open Kyle32028 opened 8 months ago

Kyle32028 commented 8 months ago

I am currently using the Titanium Ti180 M484 FPGA Development Kit and have programmed the hardware bitstream from tinyml_hello_world onto the FPGA. I am trying to perform static input inference on the tensorflow lite movenet model using the RISC-V IDE. This model has been quantized to INT8. During the initialization phase of the model, the console outputs the following error messages:

Didn't find op for builtin opcode 'CAST' version '1'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model? Failed to get registration from op code CAST. AllocateTensors() failed.

Based on the error messages, it seems that I need to update the tflite micro library. The version I'm currently using is from tinyml_hello_world. Upon comparison, I found that there are many conflicts between the version from tinyml_hello_world and the latest version of tflite micro. Do you have any suggestions on how to resolve this issue?

efxmfaiz commented 7 months ago

Hi @Kyle32028 ,

The cast layer is already there in our version of tflite-micro library, just that its not triggered. You can just modify the all_ops_resolver.cc in tensorflow/lite/micro/kernel/ folder to add the cast layer (AddCast()).

Kyle32028 commented 7 months ago

Thank you very much! Yes, I have already triggered Cast, GatherNd, and ported the div operation using the method you provided. After these steps, the model was successfully initialized. However, when invoking the model, I encountered some operations that do not support UINT8 or INT32. By researching the commit history of tflite micro, I have also added support for UINT8 and INT32 to these operations. However, I have a question: is there a way to directly update tflite micro to the latest version? Otherwise, each time I change the model, it takes a lot of time to solve the library compatibility issues. Or does supporting the current latest version of the library require changes to the RTL design? Thank you!

efxmfaiz commented 7 months ago

Hi @Kyle32028 ,

To support the latest version of tflite-micro library , there is no changes required in the RTL design. However, there are some changes need to be made on the tflite-micro library itself for compatibility with RISC-V and specific board (e.g; printing mechanism). We do notice that the latest tflite-micro version has longer compilation time, and it also slows down the overall inference time.

We do have plan to port the supported layers on the latest tflite-micro to our version progressively but the timeline is yet to be determined.

Regards, Faiz.

Kyle32028 commented 6 months ago

Hello @efxmfaiz , Since Movenet had inference errors on the current version of the TFLite Micro library, I attempted to update the TFLite Micro library to the latest version. Indeed, I encountered the issues you mentioned about longer compilation times and slower inference speeds. I would like to ask:

  1. When using the latest version of the tflite-micro library for inference, I found that turning on the tinyml accelerator does not speed up the inference process (I have generated define.v through tinyml_generator and recompiled it, as well as copied define.cc and define.h to the src/model folder). However, current version of the library can accelerate the process. Is there any changes are needed after I updated tflite-micro library to properly use the accelerator?

  2. I noticed in your provided example models, the largest model is the MediaPipe Face Landmark at 751KB. The TFLite model I am currently attempting to use is 2827KB. Is it difficult to perform real-time inference on the Ti180 M484 with a model of this size? (I aim to reduce the static inference time to around 300ms). With the current library version, I have tried to maximize the use of TinyML accelerator resources for computation acceleration, and have almost exhausted the XLRs resource on the FPGA. However, the fastest static inference I can achieve is around 3569ms. Is there any other optimization I can perform, or do I need to try using a smaller model?

  3. When I increase the resources of the tinyml accelerator, sometimes it can compile normally, but the worst negative slack becomes negative. Does this imply that such a circuit might lead to unexpected results under certain circumstances?

Thank you!