Closed sunxiaoyao-git closed 1 year ago
Hi!
Below is a brief explanation of each of the four settings. I experimented with all of them, but found that type1 without lr warmup worked OK for my application, decreasing the lr after every epoch. Note that the input epoch can be either interpreted as the training iteration (i.e. the number of updates performed) or the epoch number (if only updating after every epoch instead of iteration). I mainly updated after every epoch, rather than after every iteration, but this might differ for different applications and parts of the code in exp_main.py
might have to be changed accordingly. Hope this helps and please re-open the issue if this did not answer your question!
type1:
warmup
number of epochs or iterations.learning_rate=0.001
, lr_decay_rate=0.999
and warmup=10
type2:
type3:
period
number of iterations/epochs, but to a smaller learning rate according to decay rate.learning_rate=0.001
and lr_decay_rate=0.90
type4:
learning_rate=0.001
Thanks! pretty useful!
Sorry, I maybe have another question about the type4 method. Under my understanding, total_num_iters should restart in each epoch not in each iteration.
So the terminology might be somewhat misleading. The total_num_iter
variable should keep track of the total number of training iterations performed so far, i.e. irrespective of the epoch number, and is only used for type4
. This is because when you use this to update the learning rate for every training iteration using type4
, you don't want to restart the learning rate schedule every epoch. I apologise for the slightly confusing terminology and implementation of this as I just made it to test a few different learning rate strategies. :)
I fine you write four types of adjust_learning_rate function, what`s the theory of each type?