google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
352 stars 49 forks source link

Error Using Converted Phi-3.5-mini TFLite in Android App #293

Open chienhuikuo opened 2 weeks ago

chienhuikuo commented 2 weeks ago

Description of the bug:

I downloaded the microsoft/Phi-3.5-mini-instruct from Hugging Face and ran the convert_phi3_to_tflite.py example. The conversion to TFLite format was successful.

Next, I create a .task file for the output TFLite model following the MediaPipe LLM Inference documentation.

After that, I cloned MediaPipe LLM Inference Android Example and replaced model.bin in InferenceModel.kt with the phi-3.5-mini.task, as shown below:

companion object {
        // NB: Make sure the filename is *unique* per model you use!
        // Weight caching is currently based on filename alone.
        private const val MODEL_PATH = "/data/local/tmp/llm/phi-3.5-mini.task"
        private var instance: InferenceModel? = null

        fun getInstance(context: Context): InferenceModel {
            return if (instance != null) {
                instance!!
            } else {
                InferenceModel(context).also { instance = it }
            }
        }
    }

When I built the app and entered a prompt on my Android phone, I encountered the following error:

2024-10-11 09:25:05.516 24523-24599 tflite     com...diapipe.examples.llminference  E  Encountered unresolved custom op: odml.update_external_kv_cache.
                                                                                       See instructions: https://www.tensorflow.org/lite/guide/ops_custom 
2024-10-11 09:25:05.516 24523-24599 tflite     com...diapipe.examples.llminference  E  Node number 40 (odml.update_external_kv_cache) failed to prepare.

It seems there are some issues with the converted phi3.5_q8_seq1024_ekv1280.tflite, but I'm unsure how to resolve them.
Any insights or suggestions would be greatly appreciated.

Actual vs expected behavior:

Expected

The Phi-3.5-mini will respond to my questions without any errors.

Actual

The Phi-3.5-mini didn't respond, and the UI remained in a loading status.

Screenshot_20241011-110029

Any other information you'd like to share?

Additionally, I tested replacing the model with gemma2-2b-it-cpu-int8.task (downloaded from Kaggle), and it worked fine, so the error is unrelated to the app's code.

pkgoogle commented 2 weeks ago

Hi @chienhuikuo thanks for reporting this issue. You would need to build custom operators: https://ai.google.dev/edge/litert/models/ops_custom#register_the_operator_with_the_kernel_library ... this is quite involved and this might be a common enough task/workflow that maybe we can take a look into baking this directly into LiteRT directly. For future reference, https://github.com/google-ai-edge/LiteRT is probably a better fit for this issue. Thanks.

chienhuikuo commented 2 weeks ago

Hi @pkgoogle thank you for your reply. I would like to know if this issue will be resolved and updated in ai-edge-torch? Otherwise, we won’t be able to get the correct .tflite from convert_phi3_to_tflite.py.
Alternatively, would it be possible to provide the .task for download directly on Kaggle, similar to gemma2-2b?

pkgoogle commented 2 weeks ago

Hi @chienhuikuo, it will probably resolved eventually, but I can't make any promises. for the .task problem, can you ask in the mediapipe repo? I think it'll be a better fit there. Thanks.

beefho67 commented 2 weeks ago

Hi @chienhuikuo thanks for reporting this issue. You would need to build custom operators: https://ai.google.dev/edge/litert/models/ops_custom#register_the_operator_with_the_kernel_library ... this is quite involved and this might be a common enough task/workflow that maybe we can take a look into baking this directly into LiteRT directly. For future reference, https://github.com/google-ai-edge/LiteRT is probably a better fit for this issue. Thanks.

Hi @pkgoogle , I am working with @chienhuikuo. Thanks for your advice. We really appreciate that. We now know that it's required to build a custom operator. From this error log, however, it's hard to tell which custom operator need to be built.

2024-10-11 09:25:05.516 24523-24599 tflite     com...diapipe.examples.llminference  E  Encountered unresolved custom op: odml.update_external_kv_cache.
                                                                                        See instructions: https://www.tensorflow.org/lite/guide/ops_custom 
2024-10-11 09:25:05.516 24523-24599 tflite     com...diapipe.examples.llminference  E  Node number 40 (odml.update_external_kv_cache) failed to prepare.

Is there a way to find out which operator?

And from this https://ai.google.dev/edge/litert/models/ops_custom#convert_to_a_litert_model. It's said that this flag needs to be set to true converter.allow_custom_ops = True in the TFLiteConverter converter. I just wonder if we can do the same thing in convert_phi3_to_tflite.py? Thanks.

pkgoogle commented 1 week ago

According to the error message it is this one: Encountered unresolved custom op: odml.update_external_kv_cache. I believe you can do the same thing there -- I would edit as you need to try to get this through. I would try to follow the custom op directions as closely as you can.

talumbau commented 1 week ago

Hi,

I think the problem is that the most recent dependent packages for ai_edge_torch convert models into a format that is ahead of what is in the latest MediaPipe release (0.10.16). So can you do as follows:

  1. Check out the latest main of the repo and change to the top level directory
  2. Pip install the local repo as your version of ai_edge_torch
    pip install -e .
  3. Install a version of tf-nightly that works with MediaPipe 0.10.16:
    pip install tf-nightly==2.19.0.dev20241001

then convert the model, make the TaskBundle, etc. and try again the sample app. Thanks!

atultiwari commented 1 week ago

Hi @chienhuikuo & @beefho67 I am facing problems in creating .task file using the MediaPipe LLM Inference documentation. can you share your code snippet, that would be of great help.

Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.

Thank you

pkgoogle commented 1 week ago

Hi @chienhuikuo, let us know if the above works for you.

beefho67 commented 1 week ago

Hi @chienhuikuo & @beefho67 I am facing problems in creating .task file using the MediaPipe LLM Inference documentation. can you share your code snippet, that would be of great help.

Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.

Thank you

Hi @atultiwari , we didn't try building custom operators yet. Instead, we are following this comment and will update the result here.

chienhuikuo commented 1 week ago

Hi @talumbau @pkgoogle , After installing tf-nightly==2.19.0.dev20241001, the phi-3.5-mini.task started working. However, the model response is extremely slow, taking about three minutes to get the first token. Is there any solution to improve this performance?

Additionally, the app sometimes crashes with an error, as shown in the crash_log.txt, but I'm not sure what is causing it. Any suggestions for resolving this?

By the way, does the ai_edge_torch example convert_phi3_to_tflite.py support converting phi-3-mini-4k? If so, could I just change the model file to phi-3-mini-4k in order to convert it?

chienhuikuo commented 1 week ago

Hi @chienhuikuo & @beefho67 I am facing problems in creating .task file using the MediaPipe LLM Inference documentation. can you share your code snippet, that would be of great help.

Also, I wanted to know if you were able to solve your issue of building custom operators. if yes, then please tell what are steps, as I am unablet to understand the related documentation shared earlier.

Thank you

Hi @atultiwari, We followed the MediaPipe LLM Inference documentation, just like you did, by using the sample code and adjusting the parameter values.

The main challenge is with the tokenizer_model. Some models provide the tokenizer.model on Hugging Face, while others don't. I tried using Llama 3.2 and Phi-2, but they didn't include the tokenizer.model, so I couldn't convert them from .tflite to .task. Fortunately, I had better luck with the phi-3.5-mini.
I think the missing tokenizer_model might be create from SentencePiece, but I'm not sure how to do it. It seems like a difficult task.

kinfey commented 3 days ago

maybe you can follow this sample , to try https://github.com/kinfey/MTKPhi3Samples and download tflite model in https://huggingface.co/lokinfey/Phi-3.5-instruct-tflite

chienhuikuo commented 2 days ago

maybe you can follow this sample , to try https://github.com/kinfey/MTKPhi3Samples and download tflite model in https://huggingface.co/lokinfey/Phi-3.5-instruct-tflite

Hi @kinfey, Thanks for your support! I’d like to confirm if this .task file is Phi-3 or Phi-3.5, as the model name in your repo is listed as lokinfey/Phi-3.5-instruct-tflite, but here is phi3.task. image