Open KennethEnevoldsen opened 2 months ago
I will take a stab at a Bengali benchmark together with a colleague of mine đź‘Ť
Wonderful @rasdani feel free to create an issue on this as well so that others can see that you are working on it.
I created PRs for Indonesian languages (at least 10+ additions from 2 corpus) and African language. Once, they are approved, I can add the languages to the list.
Linguistic Families and Proposed Languages:
East Asian Languages
South Asian Languages
Indic Languages:
[x] Hindi - hin
[x] Bengali - ben
[x] Punjabi - pan
[x] Marathi - mar
[x] Gujarati - guj
[x] Urdu - urd
[x] Nepali - nep
[x] Sinhala - sin
[x] Tamil - tam
[x] Telugu - tel
[x] Kannada - kan
[x] Malayalam - mal
Dravidian Languages:
Southeast Asian Languages
Central Asian Languages
West Asian (Middle Eastern) Languages
Note this list does not claim to be comprehensive, do feel free to add to the list.