OpenBMB / MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
Apache License 2.0
4.37k stars 313 forks source link

What sequence length was used during pretraining? #152

Open petroskarypis opened 2 weeks ago

petroskarypis commented 2 weeks ago

I was wondering what sequence length was used during pretraining for the 1.2 and 2.4B model?

LDLINGLINGLING commented 3 days ago

The sequence length of 1.2 and 2.4B model are 4096 during pretrainning