allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.48k stars 449 forks source link

How many tokens were trained for 7B model. #608

Open mathfinder opened 3 months ago

mathfinder commented 3 months ago

The paper and the readme both say that 2.5 T tokens were trained. However, the corresponding config says 2 T tokens. ReadMe: https://github.com/allenai/OLMo/blob/26392798cbc4d9ac3898bd2949e77042220bf3f8/README.md?plain=1#L49 Config:

https://github.com/allenai/OLMo/blob/26392798cbc4d9ac3898bd2949e77042220bf3f8/configs/official/OLMo-7B.yaml#L74C1-L74C13

2015aroras commented 3 months ago

We had to make configuration tweaks mid-run in order to do training for more than 1 epoch (https://github.com/allenai/OLMo/issues/584). The 2.5T token count is accurate.