[BUG] FLUX model components fail to utilize available GPU memory (6.8GB/16GB used) with t5xxl/vae/clip falling back to CPU

Issue Description

I'm experiencing inefficient GPU utilization with FLUX model components. While the main FLUX model uses GPU (6.8GB VRAM), other components (t5xxl-q4_0, ae-fp16, clip_l-fp16) seem to run on CPU despite having 9.2GB of free GPU memory available.

Expected behavior: All model components should utilize available GPU memory for optimal performance. Actual behavior:

Main FLUX model uses GPU (6.8GB VRAM)
t5xxl-q4_0, ae-fp16, and clip_l-fp16 components appear to run on CPU
Getting warnings: WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!

Steps to Reproduce

Install Nexa SDK with CUDA support:

!CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

Run the following code:

from nexa.gguf import NexaImageInference
model_path = "FLUX.1-schnell:q4_0"
inference = NexaImageInference(
model_path=model_path,
wtype="q4_0",
num_inference_steps=5,
width=1024,
height=1024,
guidance_scale=1.5,
random_seed=42
)
img = inference.txt2img("A sunset over a mountain range")

OS

Linux (Kaggle environment)

Python Version

Version: 3.10

Nexa SDK Version

nexaai-0.0.9.0

GPU (if using one)

NVIDIA Tesla P100 (16GB VRAM)

NexaAI / nexa-sdk