artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.96k stars 820 forks source link

How do you process oasst1 to get 9209 examples #25

Open iMountTai opened 1 year ago

iMountTai commented 1 year ago

Great work! In your paper you say "In our experiments, we only use the top reply at each level in the conversation tree. This limits the dataset to 9,209 examples. "Could you please tell me how to handle the data? Because I got 10364 examples from 2023-04-12_oasst_ready.trees.jsonl, but I don't know where 9209 came from?

henryzhongsc commented 1 year ago

https://huggingface.co/datasets/timdettmers/openassistant-guanaco

This should be the one they are using as oasst1. Looks like there are 9,846 samples.