bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.31k stars 521 forks source link

HuggingFaceH4/oasst1_en - missing dataset #147

Open erap129 opened 11 months ago

erap129 commented 11 months ago

Hello, I wish to reproduce the StarChat training for educational purposes, but I see the dataset (HuggingFaceH4/oasst1_en) has been removed. Is there any way to download it?
If not, any suggestions for similar datasets? I want to use the current code (chat/train.py) with the least amount of friction.

jiagaoxiang commented 8 months ago

Hi, can anyone help find the dataset?