Closed Ruqyai closed 5 months ago
Hello,
Yes you can add datasets that are sourced and are not your own.
For this one, we already handle MIRACL with all languages, see this PR #642.
Feel free to add any other dataset from the list, just make sure it is of good quality.
Thank you a lot @imenelydiaker
This issue seem resolved. Will close it for now. Feel free to reopen it
Question:
I don't know if I am allowed to add a dataset that is not my own work. In the first submission, I collected the data myself through web scraping.
However, when I browsed the task folder, I found that the Arabic language is almost limited to classification. There are several large datasets like this one. Can I add it now?
https://huggingface.co/datasets/Cohere/miracl-ar-corpus-22-12
Or others from the list: https://huggingface.co/datasets?language=language:ar&sort=trending