Hi! Thank you for your work and for sharing the technical report with us. I have a questions. There is the constant lr after the decay phase as shown in the figure for each of the experiments.
However, the definition shows that the decay and last phase should be exponential.
Is it caused by the "cutoff steps T" which is in the equation for exponential decay? Thank you very much! I realy like the ideas behind!
Judging from this equation and the description of the entire article, there should be no stable learning rate stage after the Decay stage. This is my personal opinion.
Hi! Thank you for your work and for sharing the technical report with us. I have a questions. There is the constant lr after the decay phase as shown in the figure for each of the experiments.![image](https://github.com/OpenBMB/MiniCPM/assets/87812906/ca61b1ef-1ff6-41ba-a566-f580d6567c13)
However, the definition shows that the decay and last phase should be exponential.![image](https://github.com/OpenBMB/MiniCPM/assets/87812906/c3cd44ce-9c96-43ad-bbfa-7b4225410aa7)
Is it caused by the "cutoff steps T" which is in the equation for exponential decay? Thank you very much! I realy like the ideas behind!