google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.77k stars 5.08k forks source link

Conversion of gemma-2-2b-it model to TensorFlow Lite #5570

Open shubham0204 opened 4 weeks ago

shubham0204 commented 4 weeks ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

None

OS Platform and Distribution

Google Colab (Linux) Ubuntu 22.04.3 LTS

MediaPipe Tasks SDK version

0.10.14

Task name (e.g. Image classification, Gesture recognition etc.)

LLM Inference

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

The gemma-2-2b-it model must get converted to a TFLite model (for cpu)

Describe the expected behaviour

The converter.convert_checkpoint methods throws an AssertionError with no message

Standalone code/steps you may have used to try to get what you need

from huggingface_hub import hf_hub_download
import os
import mediapipe as mp
from mediapipe.tasks.python.genai import converter

REPO_ID = "google/gemma-2-2b-it"
FILENAMES = ["tokenizer.json", "tokenizer_config.json", "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
os.environ['HF_TOKEN'] = "<token>"
for filename in FILENAMES:
  hf_hub_download(repo_id=REPO_ID, filename=filename, local_dir="./gemma-2-2b-it")

config = converter.ConversionConfig(
    input_ckpt="/content/gemma-2-2b-it", 
    ckpt_format='safetensors', 
    model_type='GEMMA_2B', 
    backend="cpu", 
    output_dir="/content/intermediate/gemma-2-2b-it/", 
    combine_file_only=False, 
    vocab_model_file="/content/gemma-2-2b-it", 
    output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
)
converter.convert_checkpoint(config)

Other info / Complete Logs

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-ae16540c09c6> in <cell line: 14>()
     12     output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
     13 )
---> 14 converter.convert_checkpoint(config)

3 frames
/usr/local/lib/python3.10/dist-packages/mediapipe/tasks/python/genai/converter/quantization_util.py in quantize_tensor(var, axis, factor, sym, number_bits, use_fp, add_scale_eps, optimization_on_bound, p_value, per_channel, block_size)
    352   """
    353   # TODO: support jnp.float8_e5m2
--> 354   assert number_bits == 8 or number_bits == 4 , f"Number bits {number_bits}"
    355   jnp_var = jnp.asarray(var)
    356   # When using sub-channel, the contracting dim is split into a sub-channel
kuaashish commented 4 weeks ago

Hi @shubham0204,

Could you please confirm that you are using the example Colab provided here for model conversion and learning about the required arguments for the converter?

Thank you!!

shubham0204 commented 4 weeks ago

Yes @kuaashish I am using the same notebook. Here are the additional blocks of code I added to download Gemma 2 and convert it to TFLite,

from huggingface_hub import hf_hub_download
import os

REPO_ID = "google/gemma-2-2b-it"
FILENAMES = ["tokenizer.json", "tokenizer_config.json", "model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
os.environ['HF_TOKEN'] = "<token>"
for filename in FILENAMES:
  hf_hub_download(repo_id=REPO_ID, filename=filename, local_dir="./gemma-2-2b-it")
import mediapipe as mp
from mediapipe.tasks.python.genai import converter

config = converter.ConversionConfig(
    input_ckpt="/content/gemma-2-2b-it",
    ckpt_format='safetensors',
    model_type='GEMMA_2B',
    backend="cpu",
    output_dir="/content/intermediate/gemma-2-2b-it/",
    combine_file_only=False,
    vocab_model_file="/content/gemma-2-2b-it",
    output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
)
converter.convert_checkpoint(config)
Woody0414 commented 3 weeks ago

Add layer_norms in LayerType Class from /site-packages/mediapipe/tasks/python/genai/converter/safetensors_converter.py could pass throuth the Assert,but the output_tflite_file looks bad because its size does not reduce.


class LayerType(enum.Enum):
  """Enum for layer type."""

  NONE = 0
  ATTENTION = 1  # Layer is part of the attention module.
  FEEDFORWARD = 2  # Layer is part of the feedforward module in the Transformer.
  EMBEDDING = 3  # Layer is the embedding lookup or final projection layer.
  LAYER_NORM = (
      4  # Layer is layer normalization before and after attention layer.
  )
  LORA = 5  # Layer is LoRA weights augmented on the base model layers.

  @classmethod
  def get_layer_type(cls, layer_name: str):
    """Gets the layer type of the given layer name."""
    ffn_layers = [
        "mlp",
    ]
    attn_layers = [
        "self_attn",
    ]
    emb_layers = [
        "embed_tokens",
        "lm_head",
    ]
    layer_norms = [
        "input_layernorm",
        "post_attention_layernorm",
        "post_feedforward_layernorm",
        "pre_feedforward_layernorm",
        "final_layernorm",
        "model.norm.weight",
    ]
    lora_layers = ["lora"]
    if any(sub_name in layer_name for sub_name in lora_layers):
      return LayerType.LORA
    if any(sub_name in layer_name for sub_name in attn_layers):
      return LayerType.ATTENTION
    if any(sub_name in layer_name for sub_name in ffn_layers):
      return LayerType.FEEDFORWARD
    if any(sub_name in layer_name for sub_name in emb_layers):
      return LayerType.EMBEDDING
    if any(sub_name in layer_name for sub_name in layer_norms):
      return LayerType.LAYER_NORM
    else:
      return LayerType.NONE
shubham0204 commented 3 weeks ago

Thanks @Woody0414. I modified the Mediapipe source file, but then received the following error,

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-3-913a430439f8>](https://localhost:8080/#) in <cell line: 14>()
     12     output_tflite_file="/content/converted_models/gemma-2-2b-it-cpu"
     13 )
---> 14 converter.convert_checkpoint(config)

1 frames
[/usr/local/lib/python3.10/dist-packages/mediapipe/tasks/python/genai/converter/llm_converter.py](https://localhost:8080/#) in combined_weight_bins_to_tflite(model_type, backend, weight_path, output_tflite_file, vocab_model_file, lora_rank, lora_weight_path, lora_output_tflite_file)
    180     if lora_rank is not None:
    181       logging.fatal('LoRA is not supported for CPU backend.')
--> 182     model_ckpt_util.GenerateCpuTfLite(
    183         model_type,
    184         weight_path,

RuntimeError: NOT_FOUND: The path does not exist: /content/intermediate/gemma-2-2b-it/params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale_quantized_scale

The params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale file exists, but not params.lm.transformer.x_layers_0.ff_layer.pre_layer_norm.scale_quantized_scale

kuaashish commented 3 weeks ago

Hi @shubham0204,

It appears you are trying to convert the recently released Gemma-2-2b model. Our initial testing has focused on the Gemma 2b model, and you can find more information in our documentation here. Currently, this model cannot be converted into a TFLite format, though support for this is on our roadmap. However, we cannot provide a specific timeline for availability at this moment.

Thank you!!

github-actions[bot] commented 2 weeks ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 weeks ago

Are you satisfied with the resolution of your issue? Yes No

shubham0204 commented 1 week ago

@kuaashish would I get an update on this issue when the support to convert Gemma2 models is available?

FranzKafkaYu commented 1 week ago

I have encountered the same issue here,and I followed the example here,the error log:

Traceback (most recent call last):
  File "/home/franzkafka/Desktop/mediapipe/convert.py", line 15, in <module>
    converter.convert_checkpoint(config)
  File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 323, in convert_checkpoint
    maybe_quantize_and_write_tensors_to_bins(loader, config)
  File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 284, in maybe_quantize_and_write_tensors_to_bins
    quantized_tensors = quantize_by_actions(
  File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/llm_converter.py", line 169, in quantize_by_actions
    target_var, scale = quantization_util.quantize_tensor(
  File "/home/franzkafka/.local/lib/python3.10/site-packages/mediapipe/tasks/python/genai/converter/quantization_util.py", line 354, in quantize_tensor
    assert number_bits == 8 or number_bits == 4
AssertionError
FranzKafkaYu commented 1 week ago

@kuaashish Hi kuaashish,in MediaPipe docs it says that MediaPipe LLM inference API support gemma2 already,but now I can't find available Gemma2 TFLite format model from kaggle,so how can I use MediaPipe LLM Inference API to load Gemma2 models?

kuaashish commented 1 week ago

Hi @FranzKafkaYu,

Could you please create a new issue with a detailed description of the support you need? This will help us and the community identify and address the problem effectively with a relevant issue title.

Thank you!!

FranzKafkaYu commented 1 week ago

Hi @shubham0204,

It appears you are trying to convert the recently released Gemma-2-2b model. Our initial testing has focused on the Gemma 2b model, and you can find more information in our documentation here. Currently, this model cannot be converted into a TFLite format, though support for this is on our roadmap. However, we cannot provide a specific timeline for availability at this moment.

Thank you!!

issue created:https://github.com/google-ai-edge/mediapipe/issues/5610