Open FranzKafkaYu opened 2 months ago
related issue:https://github.com/google-ai-edge/mediapipe/issues/5570
from MediaPipe official website.it says we can use AI Edge Torch to convert Gemma2-2b to suitable format but there are no more details:
If MediaPipe Python Convert tool can support this conversion it would be good.
thanks for all of you developers.
Hi @FranzKafkaYu, I am currently looking into running Gemma 2 on AI Edge. Would it be possible to verify the source of the referenced image?
Came across a similar source, which includes a guide for running with tflite, and am now validating the reproducibility of that setup.
(Source: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#gemma-2_2b)
Thanks in advance.
Other related issues:
It seems that the issue was raised based on the following link: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/android#model
The method for converting using AI Edge Torch is detailed in the guidelines provided in the above link. Unfortunately, it seems that for now, the .tflite conversion must be done manually.
Based on this, it seems the conversion process would be as follows: Downloading the .ckpt file via Kaggle -> Converting to .tflite using AI Edge Torch -> Implementing Android inference with the .tflite file using MediaPipe.
graph TD
A[Kaggle .ckpt file] --> B[AI Edge Torch .tflite conversion]
B --> C[MediaPipe Android inference]
P.S. It seems there might be a typo in the Android guide. "AI Edge Troch" should be corrected to "AI Edge Torch in the website.
I think the documentation should mention about https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py
Hi @FranzKafkaYu, I am currently looking into running Gemma 2 on AI Edge. Would it be possible to verify the source of the referenced image?
Came across a similar source, which includes a guide for running with tflite, and am now validating the reproducibility of that setup.
(Source: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#gemma-2_2b)
Thanks in advance.
Other related issues:
5594
@KennethanCeyer Hi Ken,you can find more details via this link,if you wanna use a model with MediaPipe Solutions/Framework,you need to conver model,safetensors/pytorch format->tflite format->MediaPipe format.
currently if you use Gemma
while not Gemma2
,there are suitable formated models from kaggle,you can check this link,while Gemma2
doesn't.
MediaPipe has provided a python library for converting safetensors/pytorch format->MediaPipe format with two different methods,details here,but now this library doesn't support Gemma2
in Native model conversion,so the only choice is AI Edge model conversion,which need use AI Edge Torch tool to convert first to get the TFLite format and then use MediaPipe Python library to bundle the model.
But I have checked AI Edge Torch,it lacks details to how can we complete this convert first,and in MediaPipe LLM Inference API demonstrations there are little informations about how can we use these “bundled model”,which is ended with .task,the sample code used a native model,which is ended with .bin.
I have tried other project,like llama.cpp and gemma.cpp,the performance is not good because they mainlly use CPU to excute inference.You can have a try but I think MediaPipe witch GPU backend would be better.
I am not a native English speaker,so my English is not very good.Hope these info can help you.
I think the documentation should mention about https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py
GOOD,I will try this script and see whether we can go to the next step
Hi @FranzKafkaYu, Thank you for the explanation you've done an excellent job explaining the situation.
I've actually been investigating the same issue of using Gemma 2 with LiteRT(.tflite
) on MediaPipe, which is what brought me to this discussion. From all the issues, code records, and documentation I've reviewed, it seems .tflite
distribution of Gemma 2 hasn't yet been registered .tffile
in the kaggle or huggingface registry. (It looks like they're working hard on this and it seems probably in their roadmap, but there's no official file available yet.)
Based on the most recent visible documentation, it appears we need to convert the .ckpt file to .tffile using AI Edge Torch, and then use it according to each specific use case. (It seems like the documentation is lacking. It doesn’t look like it’s been around for very long)
The code I mentioned above seems to be the closest thing to an official guide at the moment. I'm currently working on this myself, and I'm planning to write a blog post about it when I'm done. Once it's ready, I'll make sure to share the link here in this issue for reference.
Thanks again for your helpful insights and creating this issue, Franz.
With quite a few questions expected around running Gemma 2 with MediaPipe, I made a Colab used for the conversion along with related issues and PRs. The notebook will be continuously updated until the official tflite or MediaPipe tasks are released.
Hi @FranzKafkaYu,
Apologies for the delayed response. Support for Gemma 2-2B is now available, and ongoing discussions are happening here. Please let us know if you require any further assistance, or if we can proceed to close the issue and mark it as internally resolved, as the feature has been implemented.
Thank you!!
Hi @FranzKafkaYu , I've been encountering an issue when trying to run the script ai-edge-torch/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py or gemma2-to-tflite/convert.py. In both cases, the error happens at the line where the code tries to load a file using torch.load(file).
On Google Colab: this_file_tensors = torch.load(file) ^C (this ^C is not caused by pressing Ctrl+C on the keyboard, it happens automatically)
On local machine: The same line outputs Segmentation Fault. this_file_tensors = torch.load(file) Segmentation Fault
I've checked my system's memory, and it's not an issue of insufficient memory. The same error occurs consistently in both environments. Any suggestions on what could be causing this segmentation fault or how to troubleshoot further would be greatly appreciated! Thanks in advance!
colab logs:
2024-09-12 08:01:43.412352: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1726128103.448539 2980 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1726128103.459941 2980 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-12 08:01:43.505942: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/usr/local/lib/python3.10/dist-packages/torch_xla/init.py:202: UserWarning: tensorflow
can conflict with torch-xla
. Prefer tensorflow-cpu
when using PyTorch/XLA. To silence this warning, pip uninstall -y tensorflow && pip install tensorflow-cpu
. If you are in a notebook environment such as Colab or Kaggle, restart your notebook runtime afterwards.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/ai_edge_torch/generative/utilities/loader.py:84: FutureWarning: You are using torch.load
with weights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only
will be flipped to True
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals
. We recommend you start setting weights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
this_file_tensors = torch.load(file)
^C
Hi,
Just wanted to update this issue with the latest info. Previously (as is discussed in this issue), Gemma 2 2B was only available in the LLM Inference API by going through a conversion pathway via ai_edge_torch
. This was difficult for many people (especially due to the large memory requirements for conversion and quantization of the float checkpoint). So we have made the .task
files of a quantized version of Gemma 2 available on Kaggle directly
They have the extension .task
. You use these files just like any other with the LLM Inference API. Essentially, these files contain the model weights as well as binary information on the tokenizer for the model. Please give that a try! Note: GPU and CPU models are available but GPU is mostly likely to work on newer and high-end phones for now. Thanks for trying out the Inference API. We hope to have more info to share soon!
Tiny correction: the CPU model is a .task
file, representing a successful conversion through ai_edge_torch
, but the GPU model is a .bin
file.
Hi @FranzKafkaYu,
Could you please confirm if this issue is resolved or any further assistance is needed?
Thank you!!
MediaPipe Solution (you are using)
Android library:com.google.mediapipe:tasks-genai:0.10.14
Programming language
Android Java
Are you willing to contribute it
None
Describe the feature and the current behaviour/state
currently we have no suitable MediaPipe format for Gemma2-2b running in Android,MediaPipe Python libraries can't complete conversion
Will this change the current API? How?
no
Who will benefit with this feature?
all of us
Please specify the use cases for this feature
use the latest Gemma2 model with mediapipe
Any Other info
No response