baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology
https://huggingface.co/baichuan-inc
Apache License 2.0
4.08k stars 293 forks source link

百川预训练时对表格数据的处理 #319

Closed sunshineflg closed 9 months ago

sunshineflg commented 9 months ago

模型能够正确回答文档中表格的信息,想在百川模型基础上继续预训练,但不知道百川在预训练时对表格数据如何处理,即怎么转化的?另外对转化后的数据也是做next token预测任务吗?

baichuan-assistant commented 9 months ago

目前没有做特殊的处理,欢迎做一些尝试