issues
search
AI4Bharat
/
indicnlp_catalog
A collaborative catalog of NLP resources for Indic languages
https://ai4bharat.github.io/indicnlp_catalog
531
stars
77
forks
source link
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages
#217
Open
anoopkunchukuttan
opened
1 year ago
anoopkunchukuttan
commented
1 year ago
Contains classification datasets created from Bible translations
6-classes
1504 languages is split into train, development, and test sets with a ratio of 80/10/10, with 860, 106, and 111 verses respectively
Indian language representation: ??