bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.
Apache License 2.0
2.64k stars 346 forks source link

Add more non-English datasets: IndoNLU #784

Closed gentaiscool closed 11 months ago

gentaiscool commented 2 years ago

I would like to propose adding new Indonesian datasets from the IndoNLU benchmark https://github.com/IndoNLP/indonlu for multilingual evaluation. This benchmark has various tasks: sentiment analysis, emotion classification, and textual entailment. Probably, we can start with one task like sentiment analysis.

I was wondering if we are still able to add new non-English datasets since, currently, we only have low coverage on non-English tasks.

if it is okay, I would be very happy to assign myself to add those :)

stephenbach commented 11 months ago

Closing due to inactivity. Feel free to reopen if you want to revisit this! We currently only support multilingual prompts on the eval-hackathon branch.