deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself
https://coder.deepseek.com/
MIT License
6.6k stars 461 forks source link

Pretraining code #132

Closed Calvinnncy97 closed 6 months ago

Calvinnncy97 commented 6 months ago

May I know if the pretraining code will be released? Besides, what is the data format for the pretraining? Would love to know these as it will be very helpful in continued pretraining of the model.

Thank you.

Best regards

guoday commented 6 months ago

Hello, there are currently no plans to open-source the pre-training code. The format for pre-training is consistent with most of the methods currently in use, where different samples are concatenated together and separated by a special token.

Calvinnncy97 commented 6 months ago

Got it. Thank you.