crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.26k stars 372 forks source link

Is it possible to add any arbitrary loss terms into k-diffusion, e.g. to use lpips, edge stabilization, etc. #31

Closed oxysoft closed 1 year ago

oxysoft commented 1 year ago

I used PyTTI a while back and it was easy to guide the animation into exhibiting various desired properties. If we wanted to keep the composition more stable between each frame, we could convolve the last frame and implement a loss which would attempt to preserve these edges. In Disco Diffusion, lpips is used to keep the image perceptually similar between frames and decrease flickering.

With k-diffusion, I can't figure out how to do this! I thought I might be able to do it similar to CFGDenoiser, but no dice. I think these things must be possible because you were able to implement CLIP guidance which as I understand would be a similar challenge, but the way it's implemented looks completely different from PyTTI!

Naturally my skills in DL are very surface level and I can only implement new things by reverse engineering other similar features, unfortunately I still can't make sense of the way CLIP guidance was implemented here to relate it with my goals. It looks like it has to be written as part of the sampling but it's still a mystery how this sampling process works. Any hints to push me into the right direction?

If laymen like me could better understand how to guide the generation process to any desired properties, that would be huge for AI animation! Cheers

crowsonkb commented 1 year ago

You can do that by putting the extra loss terms in the cond_fn: https://github.com/crowsonkb/k-diffusion/blob/master/sample_clip_guided.py#L101 :)

oxysoft commented 1 year ago

Omg yes, thank you so much!