chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.16k stars 71 forks source link

How to do Ensemble of Expert Denoisers w/ refiner model #83

Closed shawnrushefsky closed 9 months ago

shawnrushefsky commented 9 months ago

Trying to use this with sdxl base + refiner as documented here: https://huggingface.co/docs/diffusers/using-diffusers/sdxl#base--refiner-model. I am sharing the text_encoder_2 and vae between the base and refiner as documented. Compiling the base model works great. When I try to compile the refiner in the StableDiffusionXLImg2ImgPipeline I am getting the following error:

RuntimeError: Tried to trace <__torch__.sfast.jit.trace_helper.___torch_mangle_687.TraceablePosArgOnlyModuleWrapper object at 0x1678ecbd0> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced.

The stack trace indicates it's failing at the text encoder:

File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py", line 436, in encode_prompt

prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)

software versions:

Torch version: 2.1.2+cu121
XFormers version: 0.0.23.post1
Triton version: 2.1.0
Diffusers version: 0.24.0
Transformers version: 4.36.2
CUDA Version: 12.1
Stable Fast version: 1.0.0

Can you add an example to the readme for how to do this?

chengzeyi commented 9 months ago

@shawnrushefsky

Try using the more detailed compilation functions. For example, there are two functions called compile_uent and compile_vae in the same Python module as compile. You can call these two functions at the unet and vae of the pipeline for once.

The error occurs because the compiled parts of your pipeline cannot be recompiled. So you could explicitly compile them as you wish.

chengzeyi commented 9 months ago

And usually, there is no need to share vae and text_encoder between the base model and the refiner, as they are relatively small models compared with unet. Doing so won't save a lot of memory.

shawnrushefsky commented 9 months ago

Thanks!