MichalGeyer / plug-and-play

Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
939 stars 58 forks source link

Affect of 'scale' and negative prompt #11

Open ArielReplicate opened 1 year ago

ArielReplicate commented 1 year ago

Hi,

I'm trying to understand the way the scale parameter affects the translation output. The only information I found at the here was in the config file: "unconditional guidance scale. Note that a higher value encourages deviation from the source image"

Would you mind explaining how this parameter affect the translation and how it should be combined with other structure preserving control parameters like _'feature_injectionthreshold' and the negative prompt parameters?

tnarek commented 1 year ago

hi @ArielReplicate, the scale parameter essentially controls the fidelity of the generated image to the target prompt, i.e. a higher value of scale makes the translated image more resembling of the target prompt. Higher values of scale are mostly necessary for translating real guidance images where the DDIM-inverted noise is restrictive and challenging to deviate from. Such cases mostly occur for primitive and textureless guidance images (e.g. segmentation masks, silhouettes, etc.). Note that too high values of scale might cause undesirable artifacts, such as over-saturated colors, so it should be balanced accordingly (we generally found scale ∈ [10, 15] to give a good tradeoff).

For deviating from the guidance image content, you can also use the negative prompt parameters, which in a sense have the opposite effect from scale as they indicate what the translated image should deviate from rather than to be faithful to. Note that the negative prompt can describe only a part of the guidance content that you wish to deviate from and doesn't have to describe the guidance image as a whole.