Closed sorrowyn closed 3 years ago
i suggest reviewing thoroughly train.py
it can answer all your questions
You provide an effective training method that uses a combination of OneCyclepolicy and EMA
i suggest reviewing thoroughly train.py
it can answer all your questions
You provide an effective training method that uses a combination of OneCyclepolicy and EMA! The default value of ema_decay is 0.9997. What is the theoretical basis for this? When batch size is 16, ema_decay greater than 0.9999 is a better choice.
Hi,@mrT23 First, thank you for sharing the work. May I ask you some questions? Question A: I'm not clear about the relationship between "warmup + cosine decay" and "one cycle policy" Question B: When batch_size is 16, how do we choose ema_decay? I only have one GPU.