Open dame-cell opened 11 months ago
Thanks for sharing
hey to make your life easier i already like uploaded the dataset on hugging face
https://huggingface.co/datasets/damerajee/khasi-datasets - this is the one split into sentences
https://huggingface.co/datasets/damerajee/khasi-raw-data - raw huge paragraphs
https://paperswithcode.com/paper/enkhcorp1-0-an-english-khasi-corpus
In the paper they tell you where they found and how they collected the dataset