Open moon5bal opened 5 days ago
@moon5bal Can you share the code you are using for the conversion? I will try to reproduce this on our end.
Hi @schmidt-sebastian @kuaashish Here is code for the conversion
Thank you.
import os import mediapipe as mp from mediapipe.tasks.python.genai import converter
project_root = "/home/worker" checkpoint_path = f"{project_root}/model/gemma-7b-it" vocab_model_file = f"{project_root}/model/gemma-7b-it/tokenizer.model" output_path = f"{checkpoint_path}/tmp" ckpt_format='safetensors' model_type='GEMMA_7B' backend = 'gpu' output_tflite_file = f'{project_root}/conv_out/{model_type}_IT_Q4_feedforward4.bin'
config = converter.ConversionConfig( input_ckpt=checkpoint_path, ckpt_format=ckpt_format, model_type=model_type, attention_quant_bits=4, feedforward_quant_bits=4, embedding_quant_bits=8, backend=backend, output_dir=output_path, combine_file_only=False, vocab_model_file=vocab_model_file, output_tflite_file=output_tflite_file, )
converter.convert_checkpoint(config)
And I'm using below safetensor.
https://www.kaggle.com/models/google/gemma/transformers/1.1-7b-it
Hi @moon5bal,
Can you please follow this Colab example to convert the model for Gemma 7B, Q4 and let us know you are still facing the issue?
Thank you!!
Hi @kuaashish Thank you for your response. I created the convert script mentioned in the comment above by referring to the Colab you provided. Similarly, I used the mediapipe genai converter, but the additional work I did was to include the quantization options. I haven't tried it on Colab, but I will try to modify and do it.
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Android 14
Mobile device if the issue happens on mobile device
QCOM ADP 8155
Browser and version if the issue happens on browser
com.google.mediapipe.examples.llminference
Programming Language and version
Java
MediaPipe version
0.10.18
Bazel version
No response
Solution
llmInference
Android Studio, NDK, SDK versions (if issue is related to building in Android environment)
Android Studio Koala | 2024.1.1
Xcode & Tulsi version (if issue is related to building for iOS)
No response
Describe the actual behavior
Gemma 7B Q4 quantization model creates infinitely
Describe the expected behaviour
generate normal sentences
Standalone code/steps you may have used to try to get what you need
Other info / Complete Logs
No response