google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.62k stars 5.17k forks source link

When trying to create a Task Bundle using a TFLite file, I'm not allowed to enter the stop token of the model #5715

Open Arya-Hari opened 2 weeks ago

Arya-Hari commented 2 weeks ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Linux Ubuntu 16.04

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

No response

Programming Language and version

Python

MediaPipe version

No response

Bazel version

No response

Solution

LLM Inference

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

I created .tflite file using ai-edge-torch for Llama 3.2 1B model and now am trying to deploy it for inference on edge. When trying to create the task bundle, the stop token is asked. When I provide "<|end_of_text|>", it is not able to resolve it. I previously converted the tokenizer to the SentencePiece format through the code given in the ai-edge-torch repository.

Describe the expected behaviour

The task bundle should be created without errors.

Standalone code/steps you may have used to try to get what you need

I tried to manually check the possible tokens the model could identify using its vocab and "<|end_of_text|>" is a token in its vocab.

I also tried changing the stop token and the task bundle was created. However, on using the bundle for deployment, I was getting a Failed to initialize engine : modelError building tflite model. Also, just as a side question, the .task file that's created, can it be used interchangeably with the .bin file that's given in the model path in the repository examples?

Other info / Complete Logs

No response

kuaashish commented 2 weeks ago

Hi @Arya-Hari,

Could you please share the complete example you are using from our documentation? Additionally, if you have any error logs, sharing them would help us better understand the issue.

Thank you!!

Arya-Hari commented 1 week ago

Hello @kuaashish

So I converted the tokenizer to the SentencePiece compatible format through the code given in the ai-edge-torch repository. It generated a llama3.spm.model file.

Then I ran this script

import sentencepiece as spm

# Load the SentencePiece model
sp = spm.SentencePieceProcessor()
sp.load("/content/llama3.spm.model")

# Check special tokens or tokens that might indicate sequence ends
print("End token ID:", sp.eos_id())  # Check if the model has a predefined EOS token ID
print("Start token ID:", sp.bos_id())  # BOS may also indicate a start-of-sequence token

vocab_size = sp.get_piece_size()
for i in range(vocab_size):
    print(f"ID {i}: {sp.id_to_piece(i)}")

This then printed 128255 tokens along with their ID. The token with ID 128001 was <|end_of_text|>. According to the official documentation in the config files of Llama 3.2 1B, this is the stop token, and <|begin_of_text|> is the start token.

When running this piece of code as given in llm_bundling.ipynb,

tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}
tokenizer_model="/content/llama3.spm.model" # @param {type:"string"}
start_token="<|begin_of_text|>" # @param {type:"string"}
stop_token="<|end_of_text|>" # @param {type:"string"}
output_filename="/content/llama.task" # @param {type:"string"}
enable_bytes_to_unicode_mapping=False # @param ["False", "True"] {type:"raw"}

config = bundler.BundleConfig(
    tflite_model=tflite_model,
    tokenizer_model=tokenizer_model,
    start_token=start_token,
    stop_tokens=[stop_token],
    output_filename=output_filename,
    enable_bytes_to_unicode_mapping=enable_bytes_to_unicode_mapping,
)
bundler.create_bundle(config)

I get this error - ValueError: Failed to encode stop token <|end_of_text|> with tokenizer.. When I try with any other valid token from the list of 128255 tokens, the code executes properly and generates a .task file. This is the first issue.

Secondly, when pushing the model onto the device, the documentation requires that a .bin file be pushed. I did not understand how to generate the .bin file after generating the Task Bundle.

Your help is much appreciated. Thank you!

google-ml-butler[bot] commented 1 week ago

Are you satisfied with the resolution of your issue? Yes No

Arya-Hari commented 1 week ago

@kuaashish Hello...is there way to resolve this?

talumbau commented 3 days ago

Thanks for all of the detail provided. Two quick items:

  1. re: .task vs. .bin: Yes, you can use the .task wherever you would use a .bin file. The .task extension indicates that the file is a "converted TF Lite model + metadata/tokenizer"
  2. I noticed in your provided script that you have this line:
tflite_model="/content/gemma_2b_quantized.tflite" # @param {type:"string"}

Is this just a copy/paste error? I assumed you would have something like llama3_1_1b_quantized.tflite, not a Gemma model.

Arya-Hari commented 2 days ago

Hi @talumbau. To clarify, I used the quantization script provided in the AI Edge Torch repository for quantizing and converting it to the TFLite format. The script used there, by default, saves the output file under the name of gemma_2b_quantized.tflite. I forgot to change it before using it. But, I changed everything else from the script to work for Llama 3.2 1B instead. Sorry for the confusion.