Open shubham0204 opened 3 months ago
Hi @shubham0204,
Could you please confirm that you are using the example Colab provided here for model conversion and learning about the required arguments for the converter?
Thank you!!
Yes @kuaashish I am using the same notebook. Here are the additional blocks of code I added to download Gemma 2 and convert it to TFLite,
from huggingface_hub import hf_hub_download
import os
REPO_ID = "google/gemma-2-2b-it"
FILENAMES = ["tokenizer.json", "tokenizer_config.json", "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
os.environ['HF_TOKEN'] = "<token>"
for filename in FILENAMES:
hf_hub_download(repo_id=REPO_ID, filename=filename, local_dir="./gemma-2-2b-it")
import mediapipe as mp
from mediapipe.tasks.python.genai import converter
config = converter.ConversionConfig(
input_ckpt="/content/gemma-2-2b-it",
ckpt_format='safetensors',
model_type='GEMMA_2B',
backend="cpu",
output_dir="/content/intermediate/gemma-2-2b-it/",
combine_file_only=False,
vocab_model_file="/content/gemma-2-2b-it",
output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
)
converter.convert_checkpoint(config)
Add layer_norms in LayerType Class from /site-packages/mediapipe/tasks/python/genai/converter/safetensors_converter.py could pass throuth the Assert,but the output_tflite_file looks bad because its size does not reduce.
class LayerType(enum.Enum):
"""Enum for layer type."""
NONE = 0
ATTENTION = 1 # Layer is part of the attention module.
FEEDFORWARD = 2 # Layer is part of the feedforward module in the Transformer.
EMBEDDING = 3 # Layer is the embedding lookup or final projection layer.
LAYER_NORM = (
4 # Layer is layer normalization before and after attention layer.
)
LORA = 5 # Layer is LoRA weights augmented on the base model layers.
@classmethod
def get_layer_type(cls, layer_name: str):
"""Gets the layer type of the given layer name."""
ffn_layers = [
"mlp",
]
attn_layers = [
"self_attn",
]
emb_layers = [
"embed_tokens",
"lm_head",
]
layer_norms = [
"input_layernorm",
"post_attention_layernorm",
"post_feedforward_layernorm",
"pre_feedforward_layernorm",
"final_layernorm",
"model.norm.weight",
]
lora_layers = ["lora"]
if any(sub_name in layer_name for sub_name in lora_layers):
return LayerType.LORA
if any(sub_name in layer_name for sub_name in attn_layers):
return LayerType.ATTENTION
if any(sub_name in layer_name for sub_name in ffn_layers):
return LayerType.FEEDFORWARD
if any(sub_name in layer_name for sub_name in emb_layers):
return LayerType.EMBEDDING
if any(sub_name in layer_name for sub_name in layer_norms):
return LayerType.LAYER_NORM
else:
return LayerType.NONE
Thanks @Woody0414. I modified the Mediapipe source file, but then received the following error,
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-3-913a430439f8>](https://localhost:8080/#) in <cell line: 14>()
12 output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
13 )
---> 14 converter.convert_checkpoint(config)
1 frames
[/usr/local/lib/python3.10/dist-packages/mediapipe/tasks/python/genai/converter/llm_converter.py](https://localhost:8080/#) in combined_weight_bins_to_tflite(model_type, backend, weight_path, output_tflite_file, vocab_model_file, lora_rank, lora_weight_path, lora_output_tflite_file)
180 if lora_rank is not None:
181 logging.fatal('LoRA is not supported for CPU backend.')
--> 182 model_ckpt_util.GenerateCpuTfLite(
183 model_type,
184 weight_path,
RuntimeError: NOT_FOUND: The path does not exist: /content/intermediate/gemma-2-2b-it/params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale_quantized_scale
The params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale
file exists, but not params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale_quantized_scale
Hi @shubham0204,
It appears you are trying to convert the recently released Gemma-2-2b model. Our initial testing has focused on the Gemma 2b model, and you can find more information in our documentation here. Currently, this model cannot be converted into a TFLite format, though support for this is on our roadmap. However, we cannot provide a specific timeline for availability at this moment.
Thank you!!
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
@kuaashish would I get an update on this issue when the support to convert Gemma2 models is available?
I have encountered the same issue here,and I followed the example here,the error log:
Traceback (most recent call last):
File "/home/franzkafka/Desktop/mediapipe/convert.py", line 15, in <module>
converter.convert_checkpoint(config)
File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 323, in convert_checkpoint
maybe_quantize_and_write_tensors_to_bins(loader, config)
File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 284, in maybe_quantize_and_write_tensors_to_bins
quantized_tensors = quantize_by_actions(
File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 169, in quantize_by_actions
target_var, scale = quantization_util.quantize_tensor(
File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/quantization_util.py", line 354, in quantize_tensor
assert number_bits == 8 or number_bits == 4
AssertionError
@kuaashish Hi kuaashish,in MediaPipe docs it says that MediaPipe LLM inference API support gemma2 already,but now I can't find available Gemma2 TFLite format model from kaggle,so how can I use MediaPipe LLM Inference API to load Gemma2 models?
Hi @FranzKafkaYu,
Could you please create a new issue with a detailed description of the support you need? This will help us and the community identify and address the problem effectively with a relevant issue title.
Thank you!!
Hi @shubham0204,
It appears you are trying to convert the recently released Gemma-2-2b model. Our initial testing has focused on the Gemma 2b model, and you can find more information in our documentation here. Currently, this model cannot be converted into a TFLite format, though support for this is on our roadmap. However, we cannot provide a specific timeline for availability at this moment.
Thank you!!
issue created:https://github.com/google-ai-edge/mediapipe/issues/5610
Hi,
We updated our docs to provide info on using Gemma2-2B here. When we initially supported Gemma2-2B, the only pathway to using it on-device was converting the model through ai_edge_torch
. It still requires a system with a lot of memory to do the conversion+quantization, so we decided to just directly host the necessary file on Kaggle (at this URL). You can download the models through this interface:
For Gemma2-2b, we support a CPU version and a GPU version. Both versions work in the LLM Inference API. The CPU version is a classic "TF Lite" file and can be used in traditional ways, as shown in an example here
The LLM Inference API (doc link above), is a full featured offering that you can directly call via an Android app, as shown in our samples here
Hi @shubham0204,
Could you please review the above and confirm if we can close the status and mark it resolved internally?
Thank you!!
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
None
OS Platform and Distribution
Google Colab (Linux) Ubuntu 22.04.3 LTS
MediaPipe Tasks SDK version
0.10.14
Task name (e.g. Image classification, Gesture recognition etc.)
LLM Inference
Programming Language and version (e.g. C++, Python, Java)
Python
Describe the actual behavior
The gemma-2-2b-it model must get converted to a TFLite model (for cpu)
Describe the expected behaviour
The converter.convert_checkpoint methods throws an AssertionError with no message
Standalone code/steps you may have used to try to get what you need
Other info / Complete Logs