Open feiyangsuo opened 1 month ago
same problem but in local environment with a high compute power
Did you already try to comment out or remove the pipe.enable_model_cpu_offload()
line? I wouldn't expect this making such a difference, but maybe worth a try, if you didn't test it already
Did you already try to comment out or remove the
pipe.enable_model_cpu_offload()
line? I wouldn't expect this making such a difference, but maybe worth a try, if you didn't test it already
you've remind me that I already commented that line. After un-comment it the speed became faster now. Anyway it can generate one image within about 8~10 min. btw this seems costing like more than 40GB CPU ram
Interesting, maybe the vram isnt enough, so its swapping the model back and forth between ram and vram.
Maybe quantizing the model (to 8bit) helps, e.g. similar to how it's done in this script (by now the main diffusers branch and default huggingface repo revision should be fine). There are likely also unofficial quantized flux model versions available on huggingface.
Running demo w.r.t https://huggingface.co/black-forest-labs/FLUX.1-dev on V100-32G. It takes about 20 minutes to process one step, and for 50 sampling steps the progress bar says it needs 13+ hours to generate one single image. Anybody facing the same problem?