facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.
Apache License 2.0
870 stars 39 forks source link

Table 13. Settings for AVA. learning rate=3.6? #27

Closed justzhanghong closed 7 months ago

justzhanghong commented 7 months ago
Snipaste_2024-02-26_15-22-35

Hello, is the learning rate here sure to be 3.6? Is it this big? What is the initial learning rate for warm up?

dbolya commented 7 months ago

The global learning rate is indeed 3.6. It seemed big to me too, but that's what ended up working the best. We based our implementation off of the slowfast AVA configs, and those use a similar per-node learning rate (we use 0.225 per node * 16 nodes, and the defaults for slowfast are ~0.1 per node).

I don't believe we strayed from the default initial warmup in slowfast, which is 0.16 (0.01 per node * 16 nodes).

justzhanghong commented 7 months ago

Thanks for your reply and answer.