AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
138.84k stars 26.36k forks source link

[Feature Request]: "NULL-text Inversion for Editing Real Images using Guided Diffusion Models" - Yet another, probably better, img2img variant #5287

Open Ehplodor opened 1 year ago

Ehplodor commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

It is VERY similar to current "img2img alternative Euler script".

It proposes two innovations (in between "") : Original image -> "pivotal DDIM inversion" -> "null text optiumization" -> prompt2prompt editing

The result is obviously very good (see paper). IHDK if results of the "unofficial implementation" by @thepowerfuldeez are as good as thoses presented in the paper.

Just as img2img alt script, once the image is inverted, there is no need to invert again if modifying the new edited prompt (all other things being equal). The initial inversion process might be longer (authors report 1min for inversion, then 10s for each subsequent generations.

Proposed workflow

  1. Go to img2img
  2. select an image to edit
  3. select img2img alt script - adapted to propose various workflows (i.e. diverse inverted samplers, Euler, DDIM, etc...)
  4. tick "pivotal inversion" box
  5. tick "null text optimization" box
  6. enter various other options already present in the alt. script
  7. enter original and edited prompts / negative prompts
  8. Generate

Additional information

unofficial implementation : https://github.com/thepowerfuldeez/null-text-inversion Corresponding research article : https://arxiv.org/abs/2211.09794

loboere commented 1 year ago

official code here []https://github.com/google/prompt-to-prompt

Ehplodor commented 1 year ago

for the sake of completeness and because there seems to be advances on the unofficial side : fork of thepowerfuldeerz unoffical implementation : https://github.com/webaverse/null-text-inversion fork of the fork : https://github.com/JFKraasch/null-text-inversion

Alchete commented 1 year ago

Are you guys actively working on this? I tried a quick&dirty port of Google's code to A1111 as an extension but am ultimately getting the error:

TypeError: register_attention_control..ca_forward..forward() got an unexpected keyword argument 'encoder_hidden_states' inside the NullInversion class:

def get_noise_pred_single(self, latents, t, context):

        noise_pred = self.model.unet(latents, t, encoder_hidden_states=context)["sample"]

        return noise_pred

Any ideas?

Ehplodor commented 1 year ago

Relevant discussion with more links : https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/7314