Closed Calvinnncy97 closed 6 months ago
Hello, there are currently no plans to open-source the pre-training code. The format for pre-training is consistent with most of the methods currently in use, where different samples are concatenated together and separated by a special token.
Got it. Thank you.
May I know if the pretraining code will be released? Besides, what is the data format for the pretraining? Would love to know these as it will be very helpful in continued pretraining of the model.
Thank you.
Best regards