Open Arya-Hari opened 2 weeks ago
Hi @Arya-Hari,
Could you please share the complete example you are using from our documentation? Additionally, if you have any error logs, sharing them would help us better understand the issue.
Thank you!!
Hello @kuaashish
So I converted the tokenizer to the SentencePiece compatible format through the code given in the ai-edge-torch repository. It generated a llama3.spm.model file.
Then I ran this script
import sentencepiece as spm
# Load the SentencePiece model
sp = spm.SentencePieceProcessor()
sp.load("/content/llama3.spm.model")
# Check special tokens or tokens that might indicate sequence ends
print("End token ID:", sp.eos_id()) # Check if the model has a predefined EOS token ID
print("Start token ID:", sp.bos_id()) # BOS may also indicate a start-of-sequence token
vocab_size = sp.get_piece_size()
for i in range(vocab_size):
print(f"ID {i}: {sp.id_to_piece(i)}")
This then printed 128255 tokens along with their ID. The token with ID 128001 was <|end_of_text|>
. According to the official documentation in the config files of Llama 3.2 1B, this is the stop token, and <|begin_of_text|>
is the start token.
When running this piece of code as given in llm_bundling.ipynb,
tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}
tokenizer_model="/content/llama3.spm.model" # @param {type:"string"}
start_token="<|begin_of_text|>" # @param {type:"string"}
stop_token="<|end_of_text|>" # @param {type:"string"}
output_filename="/content/llama.task" # @param {type:"string"}
enable_bytes_to_unicode_mapping=False # @param ["False", "True"] {type:"raw"}
config = bundler.BundleConfig(
tflite_model=tflite_model,
tokenizer_model=tokenizer_model,
start_token=start_token,
stop_tokens=[stop_token],
output_filename=output_filename,
enable_bytes_to_unicode_mapping=enable_bytes_to_unicode_mapping,
)
bundler.create_bundle(config)
I get this error - ValueError: Failed to encode stop token <|end_of_text|> with tokenizer.
. When I try with any other valid token from the list of 128255 tokens, the code executes properly and generates a .task file. This is the first issue.
Secondly, when pushing the model onto the device, the documentation requires that a .bin file be pushed. I did not understand how to generate the .bin file after generating the Task Bundle.
Your help is much appreciated. Thank you!
@kuaashish Hello...is there way to resolve this?
Thanks for all of the detail provided. Two quick items:
.task
vs. .bin
: Yes, you can use the .task
wherever you would use a .bin
file. The .task
extension indicates that the file is a "converted TF Lite model + metadata/tokenizer"tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}
Is this just a copy/paste error? I assumed you would have something like llama3_1_1b_quantized.tflite
, not a Gemma model.
Hi @talumbau. To clarify, I used the quantization script provided in the AI Edge Torch repository for quantizing and converting it to the TFLite format. The script used there, by default, saves the output file under the name of gemma_2b_quantized.tflite
. I forgot to change it before using it. But, I changed everything else from the script to work for Llama 3.2 1B instead. Sorry for the confusion.
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Linux Ubuntu 16.04
Mobile device if the issue happens on mobile device
No response
Browser and version if the issue happens on browser
No response
Programming Language and version
Python
MediaPipe version
No response
Bazel version
No response
Solution
LLM Inference
Android Studio, NDK, SDK versions (if issue is related to building in Android environment)
No response
Xcode & Tulsi version (if issue is related to building for iOS)
No response
Describe the actual behavior
I created .tflite file using ai-edge-torch for Llama 3.2 1B model and now am trying to deploy it for inference on edge. When trying to create the task bundle, the stop token is asked. When I provide "<|end_of_text|>", it is not able to resolve it. I previously converted the tokenizer to the SentencePiece format through the code given in the ai-edge-torch repository.
Describe the expected behaviour
The task bundle should be created without errors.
Standalone code/steps you may have used to try to get what you need
I tried to manually check the possible tokens the model could identify using its vocab and "<|end_of_text|>" is a token in its vocab.
I also tried changing the stop token and the task bundle was created. However, on using the bundle for deployment, I was getting a Failed to initialize engine : modelError building tflite model. Also, just as a side question, the .task file that's created, can it be used interchangeably with the .bin file that's given in the model path in the repository examples?
Other info / Complete Logs
No response