Multi batch and process bypass, for a massive improvement.

AbyszOne commented 1 year ago

Thank you very much for your invaluable work. I will leave three changes that I consider top priority, and would be a complete revolution in content generation:

1 - Own Controlnet batch, without Img2Img bypass. (For controlnet blend composition)

2 - Multi-Batch. Img2Img +Controlnet simultaneous batch, for dynamic blend.

3 - Controlnet bypass. Let the chosen image remain "raw" and blend with the one from Img2Img. This can be so powerful for editing of images and videos with full temporal coherence.

Hope you can find a way. I don't know if there is any type of specific complication in any of them. As long as it's not writing code, I offer to research for alternatives and ideas. 👍🏿

revolverocelot1 commented 1 year ago

yes. im waiting this for so long

FizzleDorf commented 1 year ago

I've moved my original comment to a new issue so it gets visibility: #268

AbyszOne commented 1 year ago

I've moved my original comment to a new issue so it gets visibility: #268

COOL! I'll check it later.

Mikubill commented 1 year ago

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

AbyszOne commented 1 year ago

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

As we know, img2img organically influences controlnet, so I assume you mean how the raw image would be merged. And I have probably underestimated this problem. My idea is a rebuild for inference, like this script does: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#img2img-alternative-test Unfortunately, I don't understand how exactly this blending normally happens. For example, its one way effect. If I want to use low denoising to base img2img and have controlnet be the influence, this doesn't work at all. There is only a "glue" left from the control net image, no blending. At the moment, I've asked in the original repo if something like a model that rebuilds the same image for inference is possible, instead of transforming it. Maybe it can be done more simply, maybe it's something that can be done with another kind of extension, or maybe it's something more tedious. Don't know. In any case, it would clearly provide a very important functionality, since the blend achieved by your extension + img2img is far superior and flexible than methods like cross attention or common inpaint, and being able to use the original images would literally allow you to change the lighting or composition of any video organically, just to mention the tip of the iceberg.

FizzleDorf commented 1 year ago

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

I also mentioned this in #74 with the Video Loopback script.

Eugenii10 commented 1 year ago

Is point 3 about some sort of the feature recreated with the technique in this video: https://www.youtube.com/watch?v=_xHC3bT5GBU ?

AbyszOne commented 1 year ago

Is point 3 about some sort of the feature recreated with the technique in this video: https://www.youtube.com/watch?v=_xHC3bT5GBU ?

No. In this video a generated image is used, which is similar to the original because it is surely made with the same model. Also, it is quite likely that this youtuber was based on my post about that technique. https://www.reddit.com/r/StableDiffusion/comments/115okp4/insane_light_composition_trick_contronet_blend/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button

Point 3 is about preserve any original image as they are, to be blended with another.

FizzleDorf commented 1 year ago

I think a second pass with the lighting technique would be very welcome and would still require functionality for batching img2img and controlnet separetely. This, imo, is blending light with the original image in another pass and would count for point 3.

AbyszOne commented 1 year ago

I think a second pass with the lighting technique would be very welcome and would still require functionality for batching img2img and controlnet separetely. This, imo, is blending light with the original image in another pass and would count for point 3.

Surely there are alternatives to try, but the main problem in point 3 is "raw" inference. Currently, both images are generated from noise, and this allows for an organic blend. This means that SD will be able to create from full light to full dark from scratch, respecting both images. To achieve this with a real image, it is necessary that SD can also reconstruct it from the noise, and thus mix it with another. For example, if my actual image is day, it would be extremely difficult to make it night by combining it with another unless SD is rebuilding it from scratch and allows for organic recomposition. Kinda like Pix2PixZero does. And this is not impossible to do from controlnet, but it is more complicated than I initially thought.

Mikubill commented 1 year ago

Looks like point 3 is somehow impl in Composer (https://github.com/Mikubill/sd-webui-controlnet/discussions/359), but it also requires some modification to img2imgalt.py

AbyszOne commented 1 year ago

Looks like point 3 is somehow impl in Composer (#359), but it also requires some modification to img2imgalt.py

Thanks for the update. Very interesting paper. I have been researching various inversion and latent interpolation methods related to this task. Fortunately, as you have pointed out, many tasks already have their solution counterpart through controlnet with a lower cost of human input, instead processing hungry methods. It is very interesting how by combining the same image, the result is "fixed" with high similarity, even at denoising 1, but combining different images completely changes to fulfill a subtle composition function, which in certain denoising ranges acquires a pseudo concept-blend level, much sought after midjourney remix. We could say that "two pictures are worth a thousand words". However, although it is a form of "control" that is worth including, I understand that point 3 is not a problem related to Controlnet natively, because the key is in the mix, not in the controls. And such an effect could perhaps be achieved independently. That's why I've been playing around with ComfyUI as well, to delve into the innards of Stable Diffusion and try to understand what's going on. Still in experimentation.

About img2imgalt and controlnet, my tests without prompt + sigma adjustment are encouraging. Without standing out, its simplicity and low cost make it worth a try. At 0.8 denoising it reaches a similarity that could be enough to achieve strong effects in a reconstruction with other compositions, as we already have seen examples.

asientorojo_000116 04025-3463456346- hombremanoscabeza_000027 04019-3463456346-

Finally, I now know that a batch in img2img with nothing in controlnet automatically mixes the image across all layers. However, being able to mix two sources, either one static or both dynamic, is still a major impediment and need to consider.

AbyszOne commented 1 year ago

Here a full frame. Not picked. SD 1.5. Promptless.

hombremanoscabeza_000080 04040-3463456346-

Mikubill commented 1 year ago

Great. Will it influence by different controlnet input?

AbyszOne commented 1 year ago

I don't know if I understood correctly, but the idea is that this reconstruction with img2alt is an alternative output in controlnet, and interacts with img2img in the same way. If it could also be affected by other nets, it would be pure gold, but I would settle for at least the blend working. As a bonus, while it sucks on far faces, it's surprisingly accurate on mid and near faces even promptless. What's more, here I doubled the resolution of the original image with excellent results. ORIGINAL: d9f9cd7d9fdcaf66825dd8d872ab8e6357-11-emma-watson rsquare w700

Img2alt: 04121-347-

AbyszOne commented 1 year ago

If the question was whether CN influences that image, the answer is yes. A concurrent CN with img2alt has an effect on denoising 1, but not in ways worth mentioning. Only the reverse path seems to do the magic.

AbyszOne commented 1 year ago

Some quick test with custom models shows even better results, including hard tasks like avatar skin shapes. If this can be really be organically mixed, many people will be ecstatic.

asientorojo_000122 04252-22- 04251-22-

AbyszOne commented 1 year ago

Point 1 and 2 almost there. Quick question. Its difficult to make a control just another img2img? With cfg and denoising. Less powerful than img2alt, but would add utilities.

Mikubill / sd-webui-controlnet

Multi batch and process bypass, for a massive improvement. #249