google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
281 stars 36 forks source link

Tiny-llama Encountered unresolved custom op: odml.update_kv_cache #175

Open vignesh-spericorn opened 2 weeks ago

vignesh-spericorn commented 2 weeks ago

Description of the bug:

I converted tiny-llama model using the convert_to_tflite.py. The name of the converted model is tiny_llama_seq512_kv1024.tflite.

I tried to run inference using the following code

import tflite_runtime.interpreter as tflite
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiny-llama")

# Input text
input_text = "write a poem about sun in 4 lines"

# Tokenize the input text and convert it to tensor format
input_tokens = tokenizer.encode(input_text, return_tensors='np')  # Returns numpy array

# Load the TFLite model
model_path = "output/tiny_llama_seq512_kv1024.tflite"
interpreter = tflite.InterpreterWithCustomOps(model_path=model_path)
interpreter.allocate_tensors()

I got the following error

RuntimeError: Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.Encountered unresolved custom op: odml.update_kv_cache.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom Node number 49 (odml.update_kv_cache) failed to prepare.

Versions Python 3.11.9 tf_nightly==2.18.0.dev20240826 tflite-runtime==2.14.0 tflite-runtime-nightly==2.18.0.dev20240826 tokenizers==0.19.1 torch==2.4.0 torch-xla==2.4.0 transformers==4.44.2

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

haozha111 commented 2 weeks ago

hi,

can you use our C++ example or LLM inference API to do model inference? the error indicates the missing of a custom op (kv cache) and it fails. Currently we can't link those custom ops in python yet, but you can refer to this for how to do the inference: https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative#end-to-end-inference-pipeline

vignesh-spericorn commented 2 weeks ago

Thanks i'll try this. But can we expect the python implementation of custom ops soon ?

haozha111 commented 2 weeks ago

Thanks i'll try this. But can we expect the python implementation of custom ops soon ?

yes, we are working on it. @majiddadashi fyi