ThereforeGames / txt2img2img

Improve the editability of any Stability Diffusion subject while retaining a high degree of likeness
151 stars 22 forks source link

Warning

This script has been superseded by my new extension, Unprompted, which has the ability to run tasks after the initial txt2img process, including img2img. I do not plan on updating the original script and I cannot guarantee that it will continue working in new versions of the A1111 WebUI. Thank you for understanding.

txt2img2img for Stable Diffusion

Greatly improve the editability of any character/subject while retaining their likeness.

Introduction

txt2img2img is an experimental addon for AUTOMATIC1111's Stable Diffusion Web UI that streamlines the process of running a prompt through txt2img, then running its output through img2img using pre-defined parameters.

In addition to the ability to define your own keywords and presets, txt2img2img can intelligently auto-adjust the parameters for the img2img phase based on what the the txt2img output looks like.

We can approximate the "best" settings for img2img (denoise, CFG scale, inference steps) by considering how big or small the subject is within an image. This saves you the time and hassle of having to flip through pages in the UI and fiddle with sliders manually.

Detailed instructions are available below, but if you want to get this up and running as quickly as possible, visit the Starter Guide page:

Purpose

The main motivation for this script is improving the editability of embeddings created through Textual Inversion.

There is an ongoing question in the Stable Diffusion community to figure out how we can finetune new subjects such that they respond well to complex prompts. Textual Inversion is great at high fidelity reproduction, but it is notoriously quick to "overfit."

Here's what I've observed: when people say an embedding has been "overfitted," they are usually talking about txt2img. As it turns out, our embeddings are still quite flexible in img2img mode - as long as you provide sensible initial images, you can morph the model into contexts that would be outright impossible with txt2img.

Here's how txt2img2img can help:

You can, of course, use this script without finetuned embeddings. You can think of it as a general purpose "prompt swapper" if you like. Play around with it and see if you find any other cool use cases.

Does it work?

In my experience, yes!

Here's an example that shows the difference with the script on/off - this demonstrates a checkpoint of Sheik images from The Legend of Zelda. It was trained for 50k+ iterations and has lost basically all understanding of the English language. While it can produce a good Sheik, it ignores just about anything else I put into the prompt.

txt2img2img takes care of the issue nicely:

txt2img2img_example

More examples to come.

Installation

Simply clone or download this repo and place the files in the base directory of Automatic's web UI.

Note: this repo includes several Python modules for Rembg and its requirements. Rembg is needed for background detection as part of txt2img2img's "autotuning" feature. You are welcome to install these packages manually if you prefer, but I had a hard time getting Rembg to cooperate with Automatic's install script. Alternatively, you can skip these files as long as you set autoconfigure to false in your preset configs.

Usage

From the txt2img screen, select txt2img2img as your active script:

image

Now, you will need to set up routines for your subjects. Each routine is created as a JSON file inside the /txt2img2img/routines directory of the web app.

Check the included example.json in that directory to get a better understanding of how a routine is defined - it is mostly self-explanatory, but please see the next section for detailed instructions.

The filenames of your routines (minus '.json') are used as "keywords" in your prompt for txt2img2img processing.

JSON Options

General notes:

txt2img_term (str)

img2img_term (str)

autoconfigure (bool)

negative_prompt (str)

sampler_name (str)

seed (int)

steps (int)

cfg_scale (float)

denoising_strength (float)

restore_faces (bool)

bypass_color_correction (bool)

prompt_template (str)

overfit (int 1-10)

max_subject_size (float 0 to 1)

daisychain (bool)

The Autotuner

One of the unique features in txt2img2img is its ability to automatically adjust SD settings before the img2img step. It does this in a variety of ways:

Known Issues

Feel free to open an issue if you have any questions or run into problems.

Enjoy!