关于语料 - Githubissues

Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

https://github.com/Facico/Chinese-Vicuna

Apache License 2.0

4.14k stars 421 forks source link

关于语料 #36

Closed ZenXir closed 1 year ago

ZenXir commented 1 year ago

训练多轮对话时，提供的语料是这样的： User:xxx 和 GPT:xxx 的对话 User:xxx 和 ChatGPT:xxx 的对话 User:xxx 和 Assistant:xxx 的对话 Human:xxx 和 Assistant:xxx 的对话

角色设定不统一，这样会对对多轮训练产生影响吗？需要统一角色设定不？如果不需要统一，是不是只要是多轮对话语料，保证格式对就可以直接合在一起了？

Facico commented 1 year ago

在finetune的全局instruction不指定哪个角色是什么（类似interaction、chat中的对话场景），理论上用各种不同角色鲁棒性会更强一点。不过在语料不是狠多的情况，我觉得角色统一可能效果会好一点。