facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.4k stars 1.19k forks source link

Multigrid training #231

Open OValery16 opened 4 years ago

OValery16 commented 4 years ago

Thank you for releasing such very useful tool.

After reading "A Multigrid Method for Efficiently Training Video Models", I am a bit confused about the multigrid scheduling policy. In the paper, it is mentioned 4 approaches: baseline, long cycles, short cycles, long +short cycles (default setting)

image

However, after displaying the multigrid policy used in this repo, I get:

[2020-06-25 16:51:23,077] PID : 28742 : INFO : Long cycle index Base shape      Epochs
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 0        [8, 8, 158]     73
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 0        [4, 16, 158]    110
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 0        [2, 16, 224]    142
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 0        [1, 32, 224]    158
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 1        [8, 8, 158]     205
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 1        [4, 16, 158]    228
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 1        [2, 16, 224]    248
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 1        [1, 32, 224]    259
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 2        [8, 8, 158]     291
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 2        [4, 16, 158]    308
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 2        [2, 16, 224]    322
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 2        [1, 32, 224]    329
[2020-06-25 16:51:23,078] PID : 28742 : INFO : 3        [1, 32, 224]    358
[2020-06-25 16:51:23,078] PID : 28742 : INFO : Long cycle updates:
[2020-06-25 16:51:23,078] PID : 28742 : INFO :  BN.NORM_TYPE: sub_batchnorm
[2020-06-25 16:51:23,078] PID : 28742 : INFO :  BN.NUM_SPLITS: 8
[2020-06-25 16:51:23,078] PID : 28742 : INFO :  TRAIN.BATCH_SIZE: 512
[2020-06-25 16:51:23,078] PID : 28742 : INFO :  DATA.NUM_FRAMES x LONG_CYCLE_SAMPLING_RATE: 8x8
[2020-06-25 16:51:23,078] PID : 28742 : INFO :  DATA.TRAIN_CROP_SIZE: 158

In the paper, it is stated that:

image

image

chaoyuaw commented 4 years ago

Hi @OValery16 , Thanks for your interests. I'm not sure if I understand your question correctly, but I'll try to explain the lines you see.

The lines following "Long cycle index Base shape Epochs" describe the "long cycle" schedule when "long cycle" is enabled. If you also enable "short cycles", it doesn't change what's printed in those lines (because otherwise it might get harder to read), but internally short cycles are used. You may print out the shape of tensors obtained from the data loader to verify.