Closed justzhanghong closed 7 months ago
The global learning rate is indeed 3.6. It seemed big to me too, but that's what ended up working the best. We based our implementation off of the slowfast AVA configs, and those use a similar per-node learning rate (we use 0.225 per node * 16 nodes, and the defaults for slowfast are ~0.1 per node).
I don't believe we strayed from the default initial warmup in slowfast, which is 0.16 (0.01 per node * 16 nodes).
Thanks for your reply and answer.
Hello, is the learning rate here sure to be 3.6? Is it this big? What is the initial learning rate for warm up?