HighCWu / control-lora-v3

ControlLoRA Version 3: LoRA Is All You Need to Control the Spatial Information of Stable Diffusion.
MIT License
18 stars 0 forks source link

A few questions #1

Open lukovnikov opened 1 month ago

lukovnikov commented 1 month ago

Hi, cool work! I was looking into Control-LoRA's and wanted to ask a few questions: Q1: Did you try simply adding the encoded control signal to the image latents (then there's no need to expand the model). Q2: How do you encode the conditioning image before feeding it to the LoRA'ed backbone? Just the original latent VAE's encoder? ControlNet had a specially trained downsampling encoder. Did you try using that one instead, as provided by pretrained ControlNet? Q3: Did you try just using the PEFT library from huggingface?

HighCWu commented 1 month ago

Thank you for your interest in my project. I just implemented the idea that suddenly came to my mind, so I didn't do many other experiments. So: A1: I think adding control conditions directly to image latents will seriously affect the denoising model's ability to analyze the noise contained in image latents during the denoising process. However, your idea is very good. I guess if you add control conditions directly to image latents in most of the time steps during the denoising process, and only do not add control conditions when the image is about to become clear, you may be able to achieve your idea. A2: What you are talking about is closer to my control-lora-v2 approach, which uses controlnet-style encoding and applies lora to the weights copied from unet. I think the controlnet bypass injection method is heavy, so there is a lighter v3. A3: I did not use the training function in peft, because my training code was directly modified from the training code of diffusers. However, diffusers itself uses peft's LoraLayer to train Lora.