InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
2.06k stars 128 forks source link

layer-wise learning rate #233

Open liuheng0111 opened 3 months ago

liuheng0111 commented 3 months ago

请教一个问题,layer-wise learning rate 比如同一层的参数有Q、K、V、MLP等参数,同一层的参数学习率相同,还是按照从上到下的顺序,同一层的lr也会衰减,同一层的学习率也不同?

luohao123 commented 2 months ago

也在尝试复现XCompose的训练过程,感兴趣的可以一起交流

image