Not Able to run converted gemma model in tflite because of custom op odml.update_kv_cache

akshatshah17 commented 2 days ago

Description of the bug:

I was able to convert the gemma model from safetensors to tflite through the converter but wasn't able to run it because of this error.

INFO: Loaded model tiny_llama_seq512_kv1024_float32.tflite INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. ERROR: Encountered unresolved custom op: odml.update_kv_cache. See instructions: https://www.tensorflow.org/lite/guide/ops_custom ERROR: Node number 44 (odml.update_kv_cache) failed to prepare. ERROR: Encountered unresolved custom op: odml.update_kv_cache. See instructions: https://www.tensorflow.org/lite/guide/ops_custom ERROR: Node number 44 (odml.update_kv_cache) failed to prepare. ERROR: Failed to allocate tensors! ERROR: Benchmarking failed.

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

haozha111 commented 2 days ago

quick question: are you running the model via MediaPipe or example C++ inference? We documented how to run the pipeline here: https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative#end-to-end-inference-pipeline

The issue you encountered is due to missing registration for the custom op, which the above pipeline should automatically include. @hheydary.

akshatshah17 commented 2 days ago

I was trying to run it with tfite's (version 2.16.1) benchmark_model in that I was getting this error

google-ai-edge / ai-edge-torch