An implementation of Prompt-to-Prompt for the SDXL architecture.
P2P is an editing technique that utilizes self- and cross-attention inherent in the diffusion process, and does not rely on external tools to make local and global edits.
It takes advantage of cross- and self-attention by generating two images at the same time: an original image, and another image (the result of the edit) with some modification in its prompt. For example, "a pink bear" and "a pink dragon". It then injects the attentions of "bear" to "dragon" during the diffusion, which in return preserves the style of the original image, but also replaces "bear" with "dragon".
For more information, I highly recommend checking out the original project page and paper of this work (linked below).
While Stable Diffusion is no longer state-of-the-art, it reamins a popular base model for ongoing research due to its well-established implementations. P2P is frequently cited and remains a significant foundation for work. To make sure good research is not left behind, it's worth updating its infrastructure.
I hope this implementation encourages curious minds to explore the extent P2P's utility as Diffusion models continue to scale.
P2P has three main operations, I'm including some examples below, but the official resources explain it in-depth.
The replace operation swaps the effect of one token with a new token.
a pink bear riding a bicycle on the beach | a pink dragon riding a bicycle on the beach |
The refine operation adds an effect to an existing token, for example, an adjective.
a chocolate cake | a confetti chocolate cake |
The reweight operation generates the same image, but amplifies or attenuates the effect of a target token in the prompt. Below is an example of an attenuation of "blue" in "a blue dog".
Original: a blue dog | Attenuated: a (less) blue dog |
The original Prompt-to-Prompt project and the great researchers who worked on it: Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, Daniel Cohen-Or.
This code builds on the Huggingface's community pipeline of Prompt-to-prompt (Stable Diffusion implementation), contributed by UmerHA.