chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

Deepcopy model for further use #114

Open alecyan1993 opened 5 months ago

alecyan1993 commented 5 months ago

Hi,

I have a SDXL model that I would like to use stable-fast for acceleration. However, in our implementation we have our way to apply lora to the model. The way we did it before is deep copy the model first and then apply lora (fuse the weight) and used the copied model for inference, and delete the deep copied model after the run (reason is to prevent from lora fried issues as the way we load lora change the weight of the model).

However, when introducing stable-fast, the original model is loaded and compile (as I understand the model is registered in this step and the following tracing will based on this), if we use the same way to deep copy the model, apply lora and then do the tracing, it would have weird errors. One error is since the VAE need to be float32 in the SDXL img2img for latent generation, it has the error indicating that the vae is fp16 (I think this is because the original model in the compile registration has fp16 vae) while the image is fp32.

Do you think if there would be a way to solve it or how the registration of the compiled model work in stable-fast? Thanks!

chengzeyi commented 5 months ago

Instead of deepcopy the model, I would suggest save the model into a file-like object since that's the standard way which diffusers use.