AI4Bharat / indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages
https://ai4bharat.github.io/indicnlp_catalog
552 stars 79 forks source link

IIIT-D Multilingual Abusive Comment Identiication #126

Open GokulNC opened 3 years ago

GokulNC commented 3 years ago

Massively multilingual abusive comment identification across Indian languages in Code-Mixed text: Hindi, Telugu, Marathi, Tamil, Malayalam, Bengali, Kannada, Odia, Gujarati, Haryanvi, Bhojpuri, Rajasthani, Assamese

https://www.kaggle.com/c/iiitd-abuse-detection-challenge/data

GokulNC commented 2 years ago

Related work: Multilingual Abusive Comment Detection (MACD) at Scale for Indic Languages

Dataset: https://github.com/ShareChatAI/MACD Langs: Hindi, Tamil, Telugu, Malayalam and Kannada Count: 30k samples per lang on avg.