Project-MONAI / GenerativeModels

MONAI Generative Models makes it easy to train, evaluate, and deploy generative models and related applications
Apache License 2.0
621 stars 87 forks source link

NoiseSchedules cosine seems wrong and lead to division by 0 #489

Open Leirbag-gabrieL opened 6 months ago

Leirbag-gabrieL commented 6 months ago

I wanted to use a DDPMScheduler with a cosine scheduling and obtained images filled with nan when sampling images.

I quickly inspected the code and found that it was caused by a division by 0 in the step function of the class DDPMScheduler right here :

pred_original_sample_coeff = (alpha_prod_t_prev ** (0.5) * self.betas[timestep]) / beta_prod_t
current_sample_coeff = self.alphas[timestep] ** (0.5) * beta_prod_t_prev / beta_prod_t

beta_prod_t being equal to 0 at step 0 when using cosine scheduler because it comes from :

alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t

alphas_cumprod calculated like so in this case :

x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()

Thus, alpha_cumprod[0] = 1 and beta_prod_t = 1 - 1 = 0

I saw no issue reporting this, maybe I am using it wrong. :man_shrugging: I tried using DDPMScheduler(num_train_timesteps=1000, schedule="cosine") in the 2d_ddpm_compare_schedulers.ipynb and got nan filled images as result.

marksgraham commented 6 months ago

Hi there,

In the cosine schedule the alpha/beta are calculated with clipping, so beta_prod_t is not 0 when t=0 as far as i can see:

    x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
    alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
    alphas_cumprod /= alphas_cumprod[0].item()
    alphas = torch.clip(alphas_cumprod[1:] / alphas_cumprod[:-1], 0.0001, 0.9999)
    betas = 1.0 - alphas
    return betas, alphas, alphas_cumprod[:-1]

however there are documented problems with the cosine scheduler, see discussion here

@sRassman reports better results if you try using leading timesteps here could you try that and see if it fixes it for you?

Leirbag-gabrieL commented 6 months ago

Hi thanks for your quick response,

Indeed the alphas are clipped in the code snippet you linked, but that's not those values which are used in the scheduler. From what I saw it is the step function of DDPMScheduler which does that.

In the beginning of the step function, some variables are defined:

alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev

The issue comes from the alpha_prod_t variable which is equal to 1 at timestep 0 because with cosine scheduler enable alphas_cumprod is defined like so :

alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()

So alpha_prod_t at time step 0 is always equal to 1 and so beta_prod_t = 1 - alpha_prod_t is equal to 0 and later in that step function values are divided by this same beta_prod_t (equal to 0 and thus leading to NaN results).

I will try what @sRassman proposed and give you a feedback later :+1:

marksgraham commented 6 months ago

ah yes nice spot - it seems like we should be making sure alpha cumprod is calculated from the clipped alphas before we return it from the cosine scheduler

OdedRotem314 commented 1 month ago

Hi

I came across the same issue of receiving Nans with Cosine due to devision by zero. I looked to see if there is an open issue about it and here it is. Are there immediate plans to fix this? How do you suggest to handle it right now?

Thanks oded

virginiafdez commented 1 month ago

Dear Oded,

Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core (https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?

If so, we will look at it immediately. Otherwise, please use that alternative repository.

Thank you very much!

Virginia

OdedRotem314 commented 1 month ago

Hi Virginia

If I understand correctly the code here: https://github.com/Project-MONAI/GenerativeModels/blob/main/generative/networks/schedulers/scheduler.py

has been replaced with: https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/schedulers/scheduler.py

The cosine function is exactly the same so I don't expect any difference. I rewrote the code to work and I urge you to fix this issue. It may give much value.

Thanks Oded

On Mon, Sep 23, 2024 at 11:05 AM Virginia Fernandez < @.***> wrote:

Dear Oded,

Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core ( https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?

If so, we will look at it immediately. Otherwise, please use that alternative repository.

Thank you very much!

Virginia

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/GenerativeModels/issues/489#issuecomment-2367493105, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGGBDEX64D2DRGWMW6N7NDZX7DTTAVCNFSM6AAAAABHYBKQBCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRXGQ4TGMJQGU . You are receiving this because you commented.Message ID: @.***>

virginiafdez commented 1 month ago

Dear Oded

Thanks. We will look into it. Could you please open an issue in MONAI core describing the problem so that we can have a look at the problem from there and trace it?

Thanks!

Virginia