Open KennethanCeyer opened 1 month ago
the OOM probably happens at quantization step. We've done some significant memory reduction on converter part, but we recently found quantization will take another huge chunk of memory space. As a first step, can you try to remove quant_config
from the conversion step and see if you can get a float tflite model without issues? thanks!
@haozha111
I’ll share the memory usage result with quantization disabled as you mentioned above. Since the goal of converting to tflite is for Edge ML serving, we’ll need a solution for the OOM issue in the quantization step, if that indeed turns out to be the cause. (I’ll update the title accordingly when we reach that point.)
If the issue is confirmed to be quantization-related, it would be helpful to get guidance on whether it should be addressed within this project through a PR or tackled separately in a dedicated PyTorch quantization project. Before that, I’ll leave a comment once we confirm if the OOM issue is resolved with quantization disabled.
Thank you for your prompt response.
After removing the quantization option, the conversion completed successfully without being killed by an OOM error. The process used up to 58GB
of memory at its peak, but the memory was properly released after the conversion.
It’s clear that quantization had an impact, but I noticed that the conversion process itself still consumes a significant amount of memory. Therefore, I believe resolving the overall memory usage and OOM issues during TFLite conversion and quantization is still important, especially for edge serving, which aligns with the original topic of this issue.
About Memory Usage
It seems that the memory increases during the signature handling and bundling process, or when weights are updated during Save and Load in the TFLite conversion process. This might be due to the lack of proper garbage collection (GC), and I think more detailed debugging is needed to confirm this.
Thanks for sharing the latest info.
For the 50+GB memory usage inside TF Lite converter step, which TF version are you currently using? if you are installing the dependencies based on https://github.com/google-ai-edge/ai-edge-torch/blob/84c501503eea48129be9a8b369c5f1f5b6e89e00/requirements.txt#L9 (which is 0722 version), the converter memory fixes are not fully included in this TF version. You probably need to update the tf-nightly version to a version number after September and see if that helps reduce the memory usage. Let me know if you find the converter memory reduces after this change.
The device used for testing follows the dependency versions specified in the requirements.txt file of the google-ai-edge/ai-edge-torch repository.
Today, I ran tests under the same conditions (with quantization disabled) using the updated tf-nightly>=2.18.0.dev20240905
package version. However, unfortunately, this did not lead to any performance improvements.
Instead, the peak memory usage increased to 83.1GB
.
It seems that further detailed profiling and optimization of the memory usage during these stages will be necessary moving forward.
Thanks for the info. We will do more analysis here and get back to you.
I’m encountering out-of-memory (OOM) errors when attempting to convert Gemini 2 2b it using the AI Edge Torch conversion process on a Google Colab A100 runtime with approximately 80GB of RAM. The memory size here refers to regular memory, not GPU memory. (I understand that AI Edge Torch is currently not utilizing the GPU).
Despite following the MediaPipe user guidelines, which recommend using TensorFlow Lite (TFLite) conversion, there’s no clear documentation specifying the required hardware specifications to ensure smooth operation without errors.
This has made it challenging to complete the conversion process. I’d appreciate any insights into the recommended hardware specs for handling the conversion of Gemini 2 2b it, especially regarding memory and GPU requirements.
Current Colab Setup:
Colab link: https://colab.research.google.com/drive/19h3SZBiWuGqqtddHbF5MzOFGbPahXPIv?usp=sharing
AI Edge Conversion Code: https://github.com/KennethanCeyer/gemma2-to-tflite/blob/main/convert.py