Closed lengyueyang closed 4 hours ago
Thanks for such great work, the article mentions continuing pre-training in 6T of data, how many tokens have been trained approximately corresponding to the loaded checkpoint?
We have already described this in section 3.3 of the paper, please refer to it in Table 2.
Thanks for such great work, the article mentions continuing pre-training in 6T of data, how many tokens have been trained approximately corresponding to the loaded checkpoint?