Open sanyalsunny111 opened 2 weeks ago
I am curious how OLMo 1B and 7B models are initialized during (actually before) pre-training? The paper doesn't have this info?
I found this but still unsure which one is finally used during pre-training.
https://github.com/allenai/OLMo/blob/d72a262645d831cc80d4a974718598998103075f/olmo/config.py#L195
❓ The question
I am curious how OLMo 1B and 7B models are initialized during (actually before) pre-training? The paper doesn't have this info?
I found this but still unsure which one is finally used during pre-training.
https://github.com/allenai/OLMo/blob/d72a262645d831cc80d4a974718598998103075f/olmo/config.py#L195