Open movelikeriver opened 3 months ago
flux-schnell also out of memeory with 4090 24G; it uses only cuda:0 default. Is there a way to use two GPUs?
When this happens how do you dealloc?
it is OK with offload mode with only one 4090(24G) ; but it runs too slow, about 25s per image. I find a way to speed up with two 4090 GPUs, just load T5, CLIP, and AE model into one GPU, and the main flow model into another GPU. Then it runs about 2.3s per image, x10 faster without offload
it is OK with offload mode with only one 4090(24G) ; but it runs too slow, about 25s per image. I find a way to speed up with two 4090 GPUs, just load T5, CLIP, and AE model into one GPU, and the main flow model into another GPU. Then it runs about 2.3s per image, x10 faster without offload
I have tried the same method as yours and there is no significant time improvement, can you provide your script?
I changed the script "demo_gr.py" as follows (not clean but should work with offload mode for only one GPU and no offload mode with GPU0 and GPU1) demo_gr.txt
pipe.enable_sequential_cpu_offload() this could help
out of memory use flux-schnell and offload with 3090(24G). when finished inp step and run torch.cuda.empty_cache(),but it still keep about 1000MB memory,so it can only load model,but when infer with inp,it will out of memory.
thanks @GallonDeng ! with those changes i am able to run on 2x24 VRAM with resolution from 512x512 to 800x800 (depending on seed and prompt length)
UPD: weirdly, after removing gradio-related code, and running it just in a command-line - i got a stable performance for a bit higher resolution, it seems gradio was preventing gc by holding references to some objects)
which GPU does flux run on?
running in google cloud: NVIDIA L4, 23034MiB
command line:
got CUDA out of memory: