Open Arya-Hari opened 3 days ago
Hi @Arya-Hari, can you try w/ the actual llama conversion script? https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/llama/convert_to_tflite.py. It uses this function: https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/utilities/converter.py#L27 Which adds the required signatures.
Please review the generative API conversion examples: https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/examples#model-conversion To ensure nothing else is missed. Thanks.
Hello. The actual llama conversion script produces the required result. But the size of the file produced in around 2GB. Is 8-bit quantization already applied when running the script?
Hi @Arya-Hari, I believe so: https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/llama/convert_to_tflite.py#L54 .
Is that for the 1b or 3b model? quantized models -> 1 byte (8bits) /parameter, so from pure parameters excluding all overhead it should be around 1GB or 3GB. So perhaps the extra GB is all the overhead.
Okay I understand now. Will the use of any of the quantization recipes given in the repositories make any difference?
Hi @Arya-Hari, it can definitely make a difference if you are quantizing to different precisions such as mixed activations (where some activations are 16-bit) or if you don't use full int-8 quantization. However it'll be largely around the same size most likely. Unless the model has a lot of ops which can't be quantized or something else like that.
Description of the bug:
I tried running the example.py script given for quantization example, but for Llama. Wherever the reference to Gemma was made, I made appropriate references to Llama. The modified code looks like this -
Actual vs expected behavior:
The proper TFLite model should have been produced. However, the generated tflite file does not have the required prefill and decode sequences. Thus, after bundling with the tokenizer and when trying to run on edge using mediapipe, I get a
Failed to initialise error
.Any other information you'd like to share?
No response