black-forest-labs / flux

Official inference repo for FLUX.1 models
Apache License 2.0
14.74k stars 1.06k forks source link

HuggingFace demo runs super slow #84

Open feiyangsuo opened 1 month ago

feiyangsuo commented 1 month ago

Running demo w.r.t https://huggingface.co/black-forest-labs/FLUX.1-dev on V100-32G. It takes about 20 minutes to process one step, and for 50 sampling steps the progress bar says it needs 13+ hours to generate one single image. Anybody facing the same problem?

bobgus39 commented 1 month ago

same problem but in local environment with a high compute power

JonasLoos commented 1 month ago

Did you already try to comment out or remove the pipe.enable_model_cpu_offload() line? I wouldn't expect this making such a difference, but maybe worth a try, if you didn't test it already

feiyangsuo commented 1 month ago

Did you already try to comment out or remove the pipe.enable_model_cpu_offload() line? I wouldn't expect this making such a difference, but maybe worth a try, if you didn't test it already

you've remind me that I already commented that line. After un-comment it the speed became faster now. Anyway it can generate one image within about 8~10 min. btw this seems costing like more than 40GB CPU ram

JonasLoos commented 1 month ago

Interesting, maybe the vram isnt enough, so its swapping the model back and forth between ram and vram.

Maybe quantizing the model (to 8bit) helps, e.g. similar to how it's done in this script (by now the main diffusers branch and default huggingface repo revision should be fine). There are likely also unofficial quantized flux model versions available on huggingface.