SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Data quality for thai alpaca #330

Closed wannaphong closed 7 months ago

wannaphong commented 9 months ago

Hello SEACrowd! I am a Thai native speaker. I found your project sumbit thai_alpaca to your project. I think you should delete this dataset from your project. The dataset doesn't look like human answer and translate from alpaca dataset. It has data quatily problem from machine translation.

holylovenia commented 9 months ago

@wannaphong Thanks for reaching out to us about this. So from what I've understood from Peerat, this dataset is basically non-sensical and doesn't make sense. We'll try to find a way going forward as well as avoid this occurrence from happening again. I'll update you soon on the team's solution.

wannaphong commented 7 months ago

@holylovenia Is dataset ok?

holylovenia commented 7 months ago

@holylovenia Is dataset ok?

Hey @wannaphong, we rejected this dataset from our catalogue. Thanks for your warning. 🙏 The details can be seen here.