Can you please elaborate on the percentage of each data in the final training data mix? Any chance you could share the training data as well?
I see in a recent commit that you're using a new fastText model. Does this have the same training data mix as the one described in the blog? Can you please elaborate on the difference between the two?
Hi, I was wondering if you could share more details about training data mix for your fastText model In your blog, you mentioned you've used the following sources: https://blog.allenai.org/olmo-1-7-7b-a-24-point-improvement-on-mmlu-92b43f7d269d
Specifically, I have the following questions: