Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 421 forks source link

关于语料问题 #26

Closed ZenXir closed 1 year ago

ZenXir commented 1 year ago

从网盘新下载了 merge.json 语料 发现原来是 663M 现在变成389M了 是什么原因语料变小 只保留了70W+条 大佬?

Facico commented 1 year ago

@ZenXir 改成utf-8格式了

Facico commented 1 year ago

ascii格式表示长度要长

ZenXir commented 1 year ago

确实 刚看了下我之前转成utf8格式的 确实也是389M 哈哈