OpenBMB / MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
Apache License 2.0
4.38k stars 313 forks source link

WSD scheduler, Decay part question #116

Closed olisicky closed 1 hour ago

olisicky commented 2 months ago

Hi! Thank you for your work and for sharing the technical report with us. I have a questions. There is the constant lr after the decay phase as shown in the figure for each of the experiments. image

However, the definition shows that the decay and last phase should be exponential. image

Is it caused by the "cutoff steps T" which is in the equation for exponential decay? Thank you very much! I realy like the ideas behind!

LDLINGLINGLING commented 1 week ago

image Judging from this equation and the description of the entire article, there should be no stable learning rate stage after the Decay stage. This is my personal opinion.