city96 / SD-Latent-Interposer

A small neural network to provide interoperability between the latents generated by the different Stable Diffusion models.
Apache License 2.0
192 stars 7 forks source link

What is the advantage of making SDXL talking to SD1.5? #1

Closed yhyu13 closed 9 months ago

yhyu13 commented 11 months ago

By new bing:

Here is a summary of the GitHub repository you requested:

The SD-Latent-Interposer is a project that aims to provide interoperability between the latents generated by the different Stable Diffusion models. Stable Diffusion is a generative model that can produce high-quality images from random noise. The project contains two models: SDv1.5 and SDXL, which have different architectures and latent sizes. The interposer is a small neural network that can convert latents from one model to another, allowing for cross-model generation and manipulation. For example, one can use the interposer to generate an image using SDXL, then edit it using SDv1.5, or vice versa.

The advantage of doing this is that it enables more flexibility and creativity in using the Stable Diffusion models. Each model has its own strengths and weaknesses, and the interposer allows for combining them in various ways. For instance, SDXL can generate higher-resolution images, but SDv1.5 can perform better editing operations. By using the interposer, one can leverage the best of both worlds and create more diverse and realistic images.

yhyu13 commented 11 months ago

Is it what your proposed solution?

city96 commented 11 months ago

Hi. I think the main benefit of this solution is the ability to use v1.5 finetunes as well as v1.5 LoRAs. A lot of the v1.5 LoRAs will most likely never be remade for XL models sadly.

Here is an example. You want a "grainy security camera" effect, but native SDXL struggles to generate the desired effect. You could generate it on v1.5, but then the initial resolution would have to be lower since v1.5 has very noticeable composition/repetition issues when generating above ~768px. In this case, you can generate your "base" image with SDXL, then send it to your v1.5 model with the LoRA applied for a "second pass".

In that example, I am using SDXL for the initial image composition while using v1.5 as a "refiner". It is also possible to connect v1.5 to XL, in case you want to use the SDXL Refiner on a latent generated by v1.5. This part was already possible by using VAE Decode/Encode nodes, but the interposer saves a lot of times since it is faster.

A more advanced usecase would be using the advanced KSampler to return the leftover noise on SDXL then convert it and finish the last N steps on a v1.5 model.

I hope this answers your question, but feel free to follow up if you want more examples/clarifications/etc.

CCTV

yhyu13 commented 11 months ago

This is awesome! I do find that most LoRA models from civitai are sd1.5 only. It's a very valuable work