Nota-NetsPresso / BK-SDM

A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ECCV'24]
Other
238 stars 16 forks source link

About the training speed #45

Closed KeyKy closed 10 months ago

KeyKy commented 10 months ago

I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

KeyKy commented 10 months ago

With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory. Training BK-SDM-{Small, Tiny} results in 5∼10% decrease in GPU memory usage.

KeyKy commented 10 months ago

I seem that BK-SDM-Base will take 300h * (400K / 50K) == 2400h.

bokyeong1015 commented 9 months ago

Hi, we would like to clarify our setting.

I found that the total number of iterations for the training is 400,000.


I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

Though our models were trained on a single A100, using multiple GPUs with a smaller per-GPU batch size can accelerate training speeds.