Open WpythonW opened 1 month ago
Thank you for reaching out with your request!
We are actively addressing issues related to inefficient GPU utilization for FLUX model components. We are trying to offload more FLUX components to GPU and will fix this soon.
Issue Description
I'm experiencing inefficient GPU utilization with FLUX model components. While the main FLUX model uses GPU (6.8GB VRAM), other components (t5xxl-q4_0, ae-fp16, clip_l-fp16) seem to run on CPU despite having 9.2GB of free GPU memory available.
Expected behavior: All model components should utilize available GPU memory for optimal performance. Actual behavior:
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
Steps to Reproduce
Install Nexa SDK with CUDA support:
Run the following code:
OS
Linux (Kaggle environment)
Python Version
Version: 3.10
Nexa SDK Version
nexaai-0.0.9.0
GPU (if using one)
NVIDIA Tesla P100 (16GB VRAM)