Open Arya-Hari opened 6 days ago
The error indicates that your TF Lite model does not have the two required signatures: "prefill" and "decode". Thus, I think something went wrong in the step where you "created .tflite file using ai-edge-torch for Llama 3.2 1B". Our typical conversion scripts enforce the creation of those two signatures for a converted language model. Can you double check that the TF Lite file has those signatures and also post the conversion code you used?
Hi @talumbau,
Could you please review the information above, verify it according to the suggestions, and update us on the status?
Thank you!!
Hi @kuaashish I think you meant to tag @Arya-Hari in your comment, since you are providing a link to my comment above where I identify the reason behind the error message. Please confirm if you meant your message to be for @Arya-Hari. Thanks!
Hi. So it seems the signature prefill and decode are missing from the file. The code I used for conversion was the example.py script given in AI Edge torch repository that I changed slightly for LLama. The changed script is below -
# Copyright 2024 The AI Edge Torch Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import ai_edge_torch
from ai_edge_torch.generative.examples.llama import llama
from ai_edge_torch.generative.layers import kv_cache as kv_utils
from ai_edge_torch.generative.quantize import quant_recipes
from ai_edge_torch.generative.utilities import model_builder
import numpy as np
import torch
def main():
# Build a PyTorch model as usual
config = llama.get_fake_model_config()
model = model_builder.DecoderOnlyModel(config).eval()
idx = torch.from_numpy(np.array([[1, 2, 3, 4]]))
tokens = torch.full((1, 10), 0, dtype=torch.int, device="cpu")
tokens[0, :4] = idx
input_pos = torch.arange(0, 10, dtype=torch.int)
kv = kv_utils.KVCache.from_model_config(config)
# Create a quantization recipe to be applied to the model
quant_config = quant_recipes.full_int8_dynamic_recipe()
print(quant_config)
# Convert with quantization
edge_model = ai_edge_torch.convert(
model, (tokens, input_pos, kv), quant_config=quant_config
)
edge_model.export("/tmp/llama3_1b_quantized.tflite")
if __name__ == "__main__":
main()
Ah, I see. That is our fault for not updating that example file. Sorry about that. In order to use the LLM Inference API, please convert using the convert_*.py
files in generative/examples/*/
. For example, this convert_gemma2_to_tflite.py file:
We will update the example.py file to be consistent with the usage of the multiple signatures.
The multi-signature method for conversion looks like this:
Should be pretty easy to update your code to use that method.
Okay thank you!
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Linux Ubuntu 16.04
Mobile device if the issue happens on mobile device
Pixel 7a
Browser and version if the issue happens on browser
No response
Programming Language and version
Python
MediaPipe version
No response
Bazel version
No response
Solution
LLM Inference
Android Studio, NDK, SDK versions (if issue is related to building in Android environment)
No response
Xcode & Tulsi version (if issue is related to building for iOS)
No response
Describe the actual behavior
I created .tflite file using ai-edge-torch for Llama 3.2 1B model and created the Task Bundle as instructed in the documentation. After pushing the .task file to the device and modified the mediapipe example (which was for Gemma) for LLama. After running it, I get an error.
Describe the expected behaviour
I should be able to run inference without any issue.
Standalone code/steps you may have used to try to get what you need
Other info / Complete Logs