Open jyapayne opened 1 year ago
Good job, men! Implement your idea as a custom script and you will be great.
As I understood from .ipynb code, the method comes down to merging the current image on each step with the very original picture:
t_min = round(0.3 * total_steps)
t_max = round(0.6 * total_steps)
layout_steps = list(range(total_steps - t_max, total_steps - t_min))
encoded = pil_to_latent(input_image)
noise = torch.randn_like(encoded)
for i in layout_steps:
t = scheduler.timesteps[i]
noisy_latents = scheduler.add_noise(encoded, noise, timesteps=torch.tensor([t]))
if fine_tuned is not None:
noisy_latents = nu * fine_tuned + (1-nu) * noisy_latents
(Where fine_tuned
is the result from previous step; nu=0.9)
So its like img2img, but the image is provided at several steps, not just at first one.
Also I think that the number of steps for image injection should be configurable (not constant 0.3 as here).
total_steps
(=Steps) and guidance_scale
(=CFG) are already working in WebUI. Only two parameters needed: nu
and t_min
.
@aleksusklim there is also an extra parameter s
that is not in the ipnb that I've added above. But I don't understand how to use it. Maybe someone else can figure it out and explain.
Another implementation I found
Is there an existing issue for this?
What would your feature do ?
Provide style transfer based on a prompt. Based on this paper: https://magicmix.github.io/
Example code implementation here: https://github.com/mpaepper/stablediffusion_magicmix
Proposed workflow
Go to img2img
Select an image like normal
Select a script called "MagicMix" or "StyleTransfer" or similar
Input a prompt that you want the image to be like
Select from the following inputs (taken from the ipnb above):
nu
(v
in the paper): controls how much the prompt should overwrite the original image in the initial layout phase. If your result is too close to the original image, try increasing this parameter.total_steps
(can useSampling Steps
already in img2img tab): number of inference steps for stable diffusionmin_steps
tomax_steps
for more control. Or a ratio. Paper recommendsmin_steps=0.3*total_steps
andmax_steps=0.6*total_steps
so those can be the defaultsguidance_scale
(can useCFG
already in img2img tab): this is the classifier free guidance. The higher this is set, the more it will drive your result towards your prompt.s
(attention map scale) parameter (value between -2 and 2). It looks like it adds (when positive) or removes (when negative) the prompt to/from the image. Not sure how to use this because I don't understand how the paper defines what an attention map is and how to apply thes
parameter to it. Any tips?Hit generate and wait
Edit: you don't input an image for style transfer, but a prompt. Reworded and added extra information.
Additional information
No response