question regarding code adaptation

Kinyugo / consistency_models

A mini-library for training consistency models.

MIT License

189 stars 20 forks source link

Hi, I'm trying to apply consistency models to a different domain that is based on DDPM, and i have some questions regarding how to adapt your code

1) there are some new concepts(to me) regarding skip connection and output connection. According to Karras paper(Elucidating the Design Space of Diffusion-Based Generative Model, table 1), if i were to adapt this to a DDPM-like model, should i change the values according to the paper? (so, adapt it to column 3 instead of 4, which i think is your code) or is it OK to use the values without changing?

2) the scheduler in the code i want to adapt to uses cosine schedulers. In your code, you use Karras scheduler which i think is the one explained in the paper. Would it be ok to change the scheduler?

3) can you help me understand the relationship between "sigma, also known as t" and alpha, betas in diffusion models? Also, in the code i am about to adapt to, the model takes input the timestep while in your code, it takes in the sigma value. If i were to adapt it to my code . would it be ok to use timestep instead of sigma??

thanks

Hello. Thanks for your interest in my work.

The skip connections are applied automatically when you use the ConsistencyTrainer.
Ideally you could replace the karras schedule with cosine schedule. Though this require changing other details to ensure proper noise is added and the timestep schedule is also sensible.
Sigma controls the standard deviation of the added noise added while alpha and beta values typically interpolate between the noise and the sample. Fundamentally both achieve the same goal of adding noise to the data. Sigma serves the same purpose as timestep of informing the model how much noise was added, but normally you would replace the sinusoidal timestep embedding with a fourier based embedding.

You can't easily mix concepts from DDPM and ConsistencyModels as both have strict theoretical frameworks that guide the choices of the different schedules and noising methods. However, the same neural network architecture can be used in both scenarios without any change.

Kinyugo / consistency_models

question regarding code adaptation #11