How to prepare the training data

DachengLi1 / LongChat

Official repository for LongChat and LongEval

Apache License 2.0

504 stars 29 forks source link

How to prepare the training data #42

Open ycsun1972 opened 11 months ago

ycsun1972 commented 11 months ago

Hi, "We fine-tune the 7B and 13B models with 80k and 18k conversations, respectively." Could you provide more details about the training data? How the 80k data are prepared? Are they all with length of 16k?

Is the data used for training longchat-v1.5 the same as previous version?

Mooler0410 commented 9 months ago

Same question about longchat-v1.5. Cannot find any details about the longchat-v1.5.

DachengLi1 commented 9 months ago

@Mooler0410 oh it is the same, we just use the same data, but based on llama2