OpenBMB / MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
Apache License 2.0
4.38k stars 313 forks source link

Ask about detail of token training in paper? #118

Closed trangtv57 closed 2 months ago

trangtv57 commented 2 months ago

I want to ask about number token using for training in Annealing Phase? I carefully check number token in this phase, but in the paper, not mention, so I want ask about number token raw text high quality you used, and number instruction sample you use in phase Annealing? tks you for good paper, It's help me so much.

ShengdingHu commented 2 months ago

Thanks for your attention. In total we train 1.1T token, where in the decay it takes 0.1T token, and in the sft, it takes around 6B tokens (we do not count the number of instruction samples).