Open GokulNC opened 3 years ago
Related work: Multilingual Abusive Comment Detection (MACD) at Scale for Indic Languages
Dataset: https://github.com/ShareChatAI/MACD Langs: Hindi, Tamil, Telugu, Malayalam and Kannada Count: 30k samples per lang on avg.
Massively multilingual abusive comment identification across Indian languages in Code-Mixed text: Hindi, Telugu, Marathi, Tamil, Malayalam, Bengali, Kannada, Odia, Gujarati, Haryanvi, Bhojpuri, Rajasthani, Assamese
https://www.kaggle.com/c/iiitd-abuse-detection-challenge/data