allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.2k stars 392 forks source link

How the 1B and 7B model are initialized? #632

Open sanyalsunny111 opened 2 weeks ago

sanyalsunny111 commented 2 weeks ago

❓ The question

I am curious how OLMo 1B and 7B models are initialized during (actually before) pre-training? The paper doesn't have this info?

I found this but still unsure which one is finally used during pre-training.

https://github.com/allenai/OLMo/blob/d72a262645d831cc80d4a974718598998103075f/olmo/config.py#L195