multi-subject-render

Generate multiple complex subjects all at once!

Made as a script for the AUTOMATIC1111/stable-diffusion-webui repository.

00165-603508287-DDIM-64-7 5-ac07d41f-20221122154627

_{Miaouuuuuuuuu!}

Jump to examples!

💥 Installation 💥

Copy the url of that repository into the extension tab :

OR copy that repository in your extension folder :

You might need to restart the whole UI. Maybe twice.

The look

_{OK I know that's a big screenshot}

How the hell does this works?

First it creates your background image, then your foreground subjects, then does a depth analysis on them, cut their backgrounds, paste them onto your background and then does an img2img for a smooth blend!

^{It will cut around that lady with scissors made of code.}

Explanations of the different UI elements

I will only explain the not so obvious things because I spent enough time making that thing already.

First off, your usual UI will be for your initial background. So your normal prompt will probably be something like "a beach, sunny sky" etc.

For my example I decided to generate a bowling alley at 512x512 pixels :

00158-2629831387-Euler a-22-7 5-ac07d41f-1233221312123132

Your foreground subjects will be described in that text box.
You case use wildcards.
If you only use the first line, that line will be used for every foreground subject that will be generated.
If you use multiple lines, each line will be used for each foreground subject.
The negative prompt is carried over everything

_{Note : if you do that, you will need as many lines as foreground images generated.}

For my example I made tree penguins :

sdffsdsdfsdffsddsfsfd

That's how much the seed will be incremented at each image. If you set it to 0 you will get the same foregrounds every time. Probably not what you want unless you use the Extra option in your main UI and "Variation strength".

You can use a different sampler for the foregrounds. As well as a different CLIP value.

The final blend is there to either make a smooth pass over your collage or to make something more complex / add details to your combination.
You can use different settings and samplers for your final blend. Make as you wish. The CLIP value will be the one you've set in your settings tab. Not the one for the foregrounds. So you can decide if you prefer one way or the other.

00162-2629838387-Euler a-92-7 5-ac07d41f-20221124054727

_{The are not really playing bowling because you need fingers. They're just here for trouble.}

An important part is to set the final blend width. Your initial background will be stretched to that size so you don't really need to make it initially big. Your foregrounds subjects will be pasted onto your stretched background before the final blend. Not wide enough and you will end up having too many characters at the same spot.

The scary miscellaneous options :

The foreground distance from center multiplier will make your characters closer together if you select a lower value, further with a higher one. I usually stick in between 0.8 and 1.0
Foreground Y shift : the center character will always be at the same height. The you multiply the value of that slider by the position of the foreground subject from the center. That gives you how many pixels lower they will be. Think about some super hero movie poster with the sidies slightly lower. That's what this slider does.
Foreground depth cut treshold is the scary one. At 0 the backgrounds of your foregrounds subjects will be opaque. At 255 the entire foreground will be transparent. The best values are in between 50-60 for cartoon-like characters and 90-100 for photorealistic subjects. Too much and they lose their heads, not enough and you get some rock that were sitting on in your final blend.
Random superposition : the default is to have the center character in front. if you enable that it might not be the case anymore. That's a cool option depending on what you want to do.
The center character will be behind the others. If you use the previous option this one becomes useless.
face correction is only for the final blend. If you want that on every foreground subjects, set it in your main UI. It think it's best to enable both if you make photorealistic stuff.

Tips and tricks :

using (bokeh) and (F1.8:1.2) will make blurry backgrounds which will make it easier for the depth analysis to do a clean cut of the backgrounds.
"wide angle" in your prompt will give your more chances to have characters that won't be cropped
"skin details" or "detailed skin" raises the chances of having close-ups. I prefer to avoid.
Not enough denoising/steps on your final blend will make it look like you used scissors on your moms Vogue catalogue and pasted the ladies onto your dads Lord of the Rings favorite poster. Don't do that.
Too much denoising/steps might make the characters all look the same. It's all about finding the right middle value for your needs.
Making your foreground subjects have less height than the final image might make them look cropped.
You can now use the "Mask foregrounds in blend" checkbox to get something that looks more like a collage and use this in img2img with my other extension if you want your foreground subjects to be less alterated by the img2img blend.

Known issues

It does only render the final blend to the UI. You have to save the images (like in the settings you just don't uncheck that "save all images" checkbox and you're good).
"List index out of range" might be barfed into your terminal if you interrupt a generation. I missed a state interrupt somewhere. It does not cause any problem by itself.
Do not use the "high res fix" checkbox. The blend size slider at the end will trigger it if you use a higher resolution than your background image. So keep your normal UI size sliders near 512*512.
There can be bugs.
AttributeError: 'StableDiffusionProcessingTxt2Img' object has no attribute 'sampler_name' : You need to update your webui ("git pull" in a commandline from your webui folder)
Do check the other issues before opening a new one and try to give as many details as possible.
Use the discussion tab if it's not about a bug.
Make sure to run the latest webui version by doing "git pull" from your webui folder!

Credits

Thanks to thygate for letting me blatantly copy-paste some of his functions for the depth analysis integration in the webui.

This repository runs with MiDaS.

@ARTICLE {Ranftl2022,
    author  = "Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun",
    title   = "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer",
    journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
    year    = "2022",
    volume  = "44",
    number  = "3"
}

@article{Ranftl2021,
    author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
    title     = {Vision Transformers for Dense Prediction},
    journal   = {ICCV},
    year      = {2021},
}

A few more examples

An attempt at recreating the "Distracted boyfriend" meme. Without influencing the directions in which they are looking. 100% txt2img.

00241-2439212203-Euler a-100-7 5-ac07d41f-20221124151538 00287-2439212203-Euler a-100-7 5-ac07d41f-20221124151832 00123-60606195-DDIM-74-7 5-ac07d41f-20221124144302 00133-1894928239-DDIM-74-7 5-ac07d41f-20221124144525