About the training speed

KeyKy commented 10 months ago

I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

KeyKy commented 10 months ago

With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory. Training BK-SDM-{Small, Tiny} results in 5∼10% decrease in GPU memory usage.

KeyKy commented 10 months ago

I seem that BK-SDM-Base will take 300h * (400K / 50K) == 2400h.

bokyeong1015 commented 9 months ago

Hi, we would like to clarify our setting.

I found that the total number of iterations for the training is 400,000.

No. Although our script specifies --max_train_steps=400000, we released the checkpoints at the exact 50000-th step as described in our paper.
- The reason for setting a longer max_train_steps was to inspect the impact of iterations on model performance.

I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

one night from 19:55 to 10:00 the next day = 14h
50000 iter / 3800 iter * 14 h = 184.21 h

Though our models were trained on a single A100, using multiple GPUs with a smaller per-GPU batch size can accelerate training speeds.

Nota-NetsPresso / BK-SDM

About the training speed #45