layer-wise learning rate

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

2.06k stars 128 forks source link

Open liuheng0111 opened 3 months ago

liuheng0111 commented 3 months ago

请教一个问题，layer-wise learning rate 比如同一层的参数有Q、K、V、MLP等参数，同一层的参数学习率相同，还是按照从上到下的顺序，同一层的lr也会衰减，同一层的学习率也不同？

luohao123 commented 2 months ago

也在尝试复现XCompose的训练过程，感兴趣的可以一起交流