Question about --scale_lr - Githubissues

CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

MIT License

11.62k stars 1.51k forks source link

Question about --scale_lr #259

Open ader47 opened 1 year ago

ader47 commented 1 year ago

Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090. When should I use--scale_lr True to scale the learning rate? (Actually, it's True by default....) The learning rate is scaled by accumulate_grad_batches * ngpu * bs * base_lr Why should scaled the learning rate by this? If I use batch size 48, the learning rate will become 1*2*48*0.00005, much bigger than the lr in the Paper( 0.00005). and the model won't be converged. I want to train the model with the paper settings, should I set --scaled False ?

Joel18241096 commented 1 year ago

i have faced the same problem, and i found that the model converges well in my task while scaled is False

blusque commented 1 year ago

why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.

ader47 commented 1 year ago

why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.

I want to train on the LSUN_chruches dataset , and the batch size in the original paper is 96. The max batch size I tested on RTX 3090 is 52.

haooxia commented 11 months ago

Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090. When should I use--scale_lr True to scale the learning rate? (Actually, it's True by default....) The learning rate is scaled by accumulate_grad_batches * ngpu * bs * base_lr Why should scaled the learning rate by this? If I use batch size 48, the learning rate will become 1*2*48*0.00005, much bigger than the lr in the Paper( 0.00005). and the model won't be converged. I want to train the model with the paper settings, should I set --scaled False ?

Got the same question

clearlyzero commented 10 months ago

I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.

ader47 commented 10 months ago

I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.

The loss will be fluctuated around about 0.2

clearlyzero commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

ader47 commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings？I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False，others can be True.

clearlyzero commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings？I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False，others can be True.

If set to FALSE the lr = n_gpus*0.00005?

ader47 commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings？I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False，others can be True.

If set to FALSE the lr = n_gpus*0.00005?

No, the lr=0.00005 and I remember the code sets a lr linear scheduler, so the lr will increase from 0 to 0.00005 in 10000 steps and then keep constant.

clearlyzero commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

ader47 commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

clearlyzero commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

ader47 commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

clearlyzero commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.

ader47 commented 10 months ago

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变，看起来无法收敛。

损失会在0.2左右

在我的实验中，损失约为 0.4。可以理解为收敛在0.2左右吗？

您保留了提供的设置还是更改了设置？我保留了设置，损失约为 0.2，但我忘记了在哪个数据集上。在我的实验中，只有LSUN_Churches应该将scale_lr设置为False，其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005？

不，lr=0.00005，我记得代码设置了一个lr线性调度器，因此lr将以10000步从0增加到0.00005，然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.

Good luck 👍

clearlyzero commented 10 months ago

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

ader47 commented 10 months ago

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

yes，because it is kl-f8，f8 means it compress 8 times of the spatial size

clearlyzero commented 10 months ago

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

yes，because it is kl-f8，f8 means it compress 8 times of the spatial size

I can now use the encoder to compress the image and then diffuse it and generate some images that are not that good quality. I use a very small Unet😂

Ly403 commented 6 months ago

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

But I think this does not matter, cause that the higher the compression ratio is, the lower the quaility of generative resultes is. Actually 64x64x3 has a lower compression ratio than that of 32x32x4.