deepseek-ai / DeepSeek-Coder-V2

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
MIT License
1.19k stars 62 forks source link

How many tokens are generally trained in total? #10

Closed lengyueyang closed 4 hours ago

lengyueyang commented 2 weeks ago

Thanks for such great work, the article mentions continuing pre-training in 6T of data, how many tokens have been trained approximately corresponding to the loaded checkpoint?

guoday commented 1 week ago

We have already described this in section 3.3 of the paper, please refer to it in Table 2.