Open petroskarypis opened 2 weeks ago
I was wondering what sequence length was used during pretraining for the 1.2 and 2.4B model?
The sequence length of 1.2 and 2.4B model are 4096 during pretrainning
I was wondering what sequence length was used during pretraining for the 1.2 and 2.4B model?