Open vignesh-spericorn opened 2 weeks ago
hi,
can you use our C++ example or LLM inference API to do model inference? the error indicates the missing of a custom op (kv cache) and it fails. Currently we can't link those custom ops in python yet, but you can refer to this for how to do the inference: https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative#end-to-end-inference-pipeline
Thanks i'll try this. But can we expect the python implementation of custom ops soon ?
Thanks i'll try this. But can we expect the python implementation of custom ops soon ?
yes, we are working on it. @majiddadashi fyi
Description of the bug:
I converted tiny-llama model using the convert_to_tflite.py. The name of the converted model is tiny_llama_seq512_kv1024.tflite.
I tried to run inference using the following code
I got the following error
Versions Python 3.11.9 tf_nightly==2.18.0.dev20240826 tflite-runtime==2.14.0 tflite-runtime-nightly==2.18.0.dev20240826 tokenizers==0.19.1 torch==2.4.0 torch-xla==2.4.0 transformers==4.44.2
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response