THUDM / CodeGeeX4

CodeGeeX4-ALL-9B, a versatile model for all AI software development scenarios, including code completion, code interpreter, web search, function calling, repository-level Q&A and much more.
https://codegeex.cn
Apache License 2.0
1.5k stars 117 forks source link

能提供一个预训练的demo吗 #53

Closed chenjinxinlove closed 2 months ago

chenjinxinlove commented 3 months ago

我想使用我自己的代码库,用text类型的文件,来预训练一下。但是现在在create_datasets时报错

train_dataset, eval_dataset = create_datasets(tokenizer, args)

希望官方能提供一个demo,类似starcoder2

chenjinxinlove commented 2 months ago

使用了LLaMA-Factory完成了pt