"one cycle policy" and "warmup + cosine decay" and "ema_decay"

Alibaba-MIIL / ASL

Official Pytorch Implementation of: "Asymmetric Loss For Multi-Label Classification"(ICCV, 2021) paper

MIT License

732 stars 102 forks source link

"one cycle policy" and "warmup + cosine decay" and "ema_decay" #71

Closed sorrowyn closed 3 years ago

sorrowyn commented 3 years ago

Hi,@mrT23 First, thank you for sharing the work. May I ask you some questions? Question A: I'm not clear about the relationship between "warmup + cosine decay" and "one cycle policy" Question B: When batch_size is 16, how do we choose ema_decay? I only have one GPU.

mrT23 commented 3 years ago

i suggest reviewing thoroughly train.py

it can answer all your questions

sorrowyn commented 3 years ago

You provide an effective training method that uses a combination of OneCyclepolicy and EMA

i suggest reviewing thoroughly train.py

it can answer all your questions

You provide an effective training method that uses a combination of OneCyclepolicy and EMA! The default value of ema_decay is 0.9997. What is the theoretical basis for this? When batch size is 16, ema_decay greater than 0.9999 is a better choice.