Guitaricet / relora

Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
https://arxiv.org/abs/2307.05695
Apache License 2.0
436 stars 39 forks source link

Does this model continue to improve in large scale regime? #1

Closed DaehanKim closed 1 year ago

DaehanKim commented 1 year ago

hi! Thank you for this insightful work. I wonder how your research on large language models is going on. Is your 350M run finished? also I'm curious about how good billion scale models can be. Do you have any plans to scale up your experiments?

Guitaricet commented 1 year ago

Hi! Thanks for the interest! Our latest results show that when applied to 350M model, ReLoRA can perform just as well as regular training. We are currently scaling up ReLoRA to test it on Pythia-1B with Eleuther.Ai. You can see the 1B code in dev branch.

W B Chart 7_19_2023, 11_00_07 AM

DumoeDss commented 1 year ago

Hi! Thanks for the interest! Our latest results show that when applied to 350M model, ReLoRA can perform just as well as regular training. We are currently scaling up ReLoRA to test it on Pythia-1B with Eleuther.Ai. You can see the 1B code in dev branch.

W B Chart 7_19_2023, 11_00_07 AM

Hey~ what is the result of the Pythia-1B?

datalee commented 1 year ago

6b model seem not work