Open chienhuikuo opened 2 weeks ago
Hi @chienhuikuo thanks for reporting this issue. You would need to build custom operators: https://ai.google.dev/edge/litert/models/ops_custom#register_the_operator_with_the_kernel_library ... this is quite involved and this might be a common enough task/workflow that maybe we can take a look into baking this directly into LiteRT directly. For future reference, https://github.com/google-ai-edge/LiteRT is probably a better fit for this issue. Thanks.
Hi @pkgoogle thank you for your reply.
I would like to know if this issue will be resolved and updated in ai-edge-torch? Otherwise, we won’t be able to get the correct .tflite
from convert_phi3_to_tflite.py
.
Alternatively, would it be possible to provide the .task
for download directly on Kaggle, similar to gemma2-2b?
Hi @chienhuikuo, it will probably resolved eventually, but I can't make any promises. for the .task problem, can you ask in the mediapipe repo? I think it'll be a better fit there. Thanks.
Hi @chienhuikuo thanks for reporting this issue. You would need to build custom operators: https://ai.google.dev/edge/litert/models/ops_custom#register_the_operator_with_the_kernel_library ... this is quite involved and this might be a common enough task/workflow that maybe we can take a look into baking this directly into LiteRT directly. For future reference, https://github.com/google-ai-edge/LiteRT is probably a better fit for this issue. Thanks.
Hi @pkgoogle , I am working with @chienhuikuo. Thanks for your advice. We really appreciate that. We now know that it's required to build a custom operator. From this error log, however, it's hard to tell which custom operator need to be built.
2024-10-11 09:25:05.516 24523-24599 tflite com...diapipe.examples.llminference E Encountered unresolved custom op: odml.update_external_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom
2024-10-11 09:25:05.516 24523-24599 tflite com...diapipe.examples.llminference E Node number 40 (odml.update_external_kv_cache) failed to prepare.
Is there a way to find out which operator?
And from this https://ai.google.dev/edge/litert/models/ops_custom#convert_to_a_litert_model. It's said that this flag needs to be set to true converter.allow_custom_ops = True
in the TFLiteConverter converter. I just wonder if we can do the same thing in convert_phi3_to_tflite.py? Thanks.
According to the error message it is this one: Encountered unresolved custom op: odml.update_external_kv_cache.
I believe you can do the same thing there -- I would edit as you need to try to get this through. I would try to follow the custom op directions as closely as you can.
Hi,
I think the problem is that the most recent dependent packages for ai_edge_torch
convert models into a format that is ahead of what is in the latest MediaPipe release (0.10.16). So can you do as follows:
ai_edge_torch
pip install -e .
pip install tf-nightly==2.19.0.dev20241001
then convert the model, make the TaskBundle, etc. and try again the sample app. Thanks!
Hi @chienhuikuo & @beefho67
I am facing problems in creating .task
file using the MediaPipe LLM Inference documentation.
can you share your code snippet, that would be of great help.
Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.
Thank you
Hi @chienhuikuo, let us know if the above works for you.
Hi @chienhuikuo & @beefho67 I am facing problems in creating
.task
file using the MediaPipe LLM Inference documentation. can you share your code snippet, that would be of great help.Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.
Thank you
Hi @atultiwari , we didn't try building custom operators yet. Instead, we are following this comment and will update the result here.
Hi @talumbau @pkgoogle ,
After installing tf-nightly==2.19.0.dev20241001
, the phi-3.5-mini.task
started working.
However, the model response is extremely slow, taking about three minutes to get the first token. Is there any solution to improve this performance?
Additionally, the app sometimes crashes with an error, as shown in the crash_log.txt, but I'm not sure what is causing it. Any suggestions for resolving this?
By the way, does the ai_edge_torch example convert_phi3_to_tflite.py support converting phi-3-mini-4k? If so, could I just change the model file to phi-3-mini-4k in order to convert it?
Hi @chienhuikuo & @beefho67 I am facing problems in creating
.task
file using the MediaPipe LLM Inference documentation. can you share your code snippet, that would be of great help.Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.
Thank you
Hi @atultiwari, We followed the MediaPipe LLM Inference documentation, just like you did, by using the sample code and adjusting the parameter values.
The main challenge is with the tokenizer_model
. Some models provide the tokenizer.model
on Hugging Face, while others don't. I tried using Llama 3.2 and Phi-2, but they didn't include the tokenizer.model
, so I couldn't convert them from .tflite
to .task
. Fortunately, I had better luck with the phi-3.5-mini.
I think the missing tokenizer_model
might be create from SentencePiece, but I'm not sure how to do it. It seems like a difficult task.
maybe you can follow this sample , to try https://github.com/kinfey/MTKPhi3Samples and download tflite model in https://huggingface.co/lokinfey/Phi-3.5-instruct-tflite
maybe you can follow this sample , to try https://github.com/kinfey/MTKPhi3Samples and download tflite model in https://huggingface.co/lokinfey/Phi-3.5-instruct-tflite
Hi @kinfey,
Thanks for your support!
I’d like to confirm if this .task
file is Phi-3 or Phi-3.5, as the model name in your repo is listed as lokinfey/Phi-3.5-instruct-tflite
, but here is phi3.task
.
Description of the bug:
I downloaded the
microsoft/Phi-3.5-mini-instruct
from Hugging Face and ran the convert_phi3_to_tflite.py example. The conversion to TFLite format was successful.Next, I create a
.task
file for the output TFLite model following the MediaPipe LLM Inference documentation.After that, I cloned MediaPipe LLM Inference Android Example and replaced
model.bin
in InferenceModel.kt with thephi-3.5-mini.task
, as shown below:When I built the app and entered a prompt on my Android phone, I encountered the following error:
It seems there are some issues with the converted
phi3.5_q8_seq1024_ekv1280.tflite
, but I'm unsure how to resolve them.Any insights or suggestions would be greatly appreciated.
Actual vs expected behavior:
Expected
The Phi-3.5-mini will respond to my questions without any errors.
Actual
The Phi-3.5-mini didn't respond, and the UI remained in a loading status.
Any other information you'd like to share?
Additionally, I tested replacing the model with
gemma2-2b-it-cpu-int8.task
(downloaded from Kaggle), and it worked fine, so the error is unrelated to the app's code.