Open Leirbag-gabrieL opened 6 months ago
Hi there,
In the cosine schedule the alpha/beta are calculated with clipping, so beta_prod_t is not 0 when t=0 as far as i can see:
x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()
alphas = torch.clip(alphas_cumprod[1:] / alphas_cumprod[:-1], 0.0001, 0.9999)
betas = 1.0 - alphas
return betas, alphas, alphas_cumprod[:-1]
however there are documented problems with the cosine scheduler, see discussion here
@sRassman reports better results if you try using leading timesteps here could you try that and see if it fixes it for you?
Hi thanks for your quick response,
Indeed the alphas are clipped in the code snippet you linked, but that's not those values which are used in the scheduler.
From what I saw it is the step
function of DDPMScheduler
which does that.
In the beginning of the step
function, some variables are defined:
alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev
The issue comes from the alpha_prod_t
variable which is equal to 1 at timestep 0 because with cosine scheduler enable alphas_cumprod
is defined like so :
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()
So alpha_prod_t
at time step 0 is always equal to 1 and so beta_prod_t = 1 - alpha_prod_t
is equal to 0 and later in that step
function values are divided by this same beta_prod_t
(equal to 0 and thus leading to NaN results).
I will try what @sRassman proposed and give you a feedback later :+1:
ah yes nice spot - it seems like we should be making sure alpha cumprod is calculated from the clipped alphas before we return it from the cosine scheduler
Hi
I came across the same issue of receiving Nans with Cosine due to devision by zero. I looked to see if there is an open issue about it and here it is. Are there immediate plans to fix this? How do you suggest to handle it right now?
Thanks oded
Dear Oded,
Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core (https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?
If so, we will look at it immediately. Otherwise, please use that alternative repository.
Thank you very much!
Virginia
Hi Virginia
If I understand correctly the code here: https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/schedulers/scheduler.py
has been replaced with: https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/schedulers/scheduler.py
The cosine function is exactly the same so I don't expect any difference. I rewrote the code to work and I urge you to fix this issue. It may give much value.
Thanks Oded
On Mon, Sep 23, 2024 at 11:05 AM Virginia Fernandez < @.***> wrote:
Dear Oded,
Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core ( https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?
If so, we will look at it immediately. Otherwise, please use that alternative repository.
Thank you very much!
Virginia
— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/GenerativeModels/issues/489#issuecomment-2367493105, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGGBDEX64D2DRGWMW6N7NDZX7DTTAVCNFSM6AAAAABHYBKQBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRXGQ4TGMJQGU . You are receiving this because you commented.Message ID: @.***>
Dear Oded
Thanks. We will look into it. Could you please open an issue in MONAI core describing the problem so that we can have a look at the problem from there and trace it?
Thanks!
Virginia
I wanted to use a
DDPMScheduler
with a cosine scheduling and obtained images filled with nan when sampling images.I quickly inspected the code and found that it was caused by a division by 0 in the
step
function of the classDDPMScheduler
right here :beta_prod_t
being equal to 0 at step 0 when using cosine scheduler because it comes from :alphas_cumprod
calculated like so in this case :Thus, alpha_cumprod[0] = 1 and beta_prod_t = 1 - 1 = 0
I saw no issue reporting this, maybe I am using it wrong. :man_shrugging: I tried using
DDPMScheduler(num_train_timesteps=1000, schedule="cosine")
in the2d_ddpm_compare_schedulers.ipynb
and got nan filled images as result.