What`s the differences between type1, type2, type3 and type4 in tools.py

sunxiaoyao-git commented 1 year ago

I fine you write four types of adjust_learning_rate function, what`s the theory of each type?

LarsBentsen commented 1 year ago

Hi! Below is a brief explanation of each of the four settings. I experimented with all of them, but found that type1 without lr warmup worked OK for my application, decreasing the lr after every epoch. Note that the input epoch can be either interpreted as the training iteration (i.e. the number of updates performed) or the epoch number (if only updating after every epoch instead of iteration). I mainly updated after every epoch, rather than after every iteration, but this might differ for different applications and parts of the code in exp_main.py might have to be changed accordingly. Hope this helps and please re-open the issue if this did not answer your question!

type1:
- Learning rate is linearly increased for warmup number of epochs or iterations.
- After this learning rate will decrease every time according to exponential decay.
- learning_rate=0.001, lr_decay_rate=0.999 and warmup=10
type2:
- Changes lr to specified values only at specific epochs.
type3:
- Cosine decay that is restarted every period number of iterations/epochs, but to a smaller learning rate according to decay rate.
- learning_rate=0.001 and lr_decay_rate=0.90
type4:
- Similar to type1, but with different decay and warm-up rates. This is mainly suited for a setting where you update every iteration, not every epoch. Similar to the Transformer lr schedule from the "Attention is All You Need" paper.
- learning_rate=0.001

sunxiaoyao-git commented 1 year ago

Thanks! pretty useful!

sunxiaoyao-git commented 1 year ago

Sorry, I maybe have another question about the type4 method. Under my understanding, total_num_iters should restart in each epoch not in each iteration.

LarsBentsen commented 1 year ago

So the terminology might be somewhat misleading. The total_num_iter variable should keep track of the total number of training iterations performed so far, i.e. irrespective of the epoch number, and is only used for type4. This is because when you use this to update the learning rate for every training iteration using type4, you don't want to restart the learning rate schedule every epoch. I apologise for the slightly confusing terminology and implementation of this as I just made it to test a few different learning rate strategies. :)

LarsBentsen / FFTransformer

What`s the differences between type1, type2, type3 and type4 in tools.py #4