SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Create dataset loader for generated_reviews_enth #392

Closed SamuelCahyawijaya closed 5 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: generated_review_enth/generated_review_enth.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?generated_review_enth

Dataset generated_review_enth
Description This dataset (referred to as generated_reviews_yn in scb-mt-en-th-2020) are English product reviews generated by CTRL, translated by Google Translate API and annotated as accepted or rejected (correct) based on fluency and adequacy of the translation by human annotators. This allows it to be used for English-to-Thai translation quality esitmation (binary label), machine translation, and sentiment analysis. For SEACrowd, we use data with correct = 1.
Subsets -
Languages tha, eng
Tasks Machine Translation
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://github.com/vistec-ai/generated_reviews_enth
HF URL -
Paper URL https://arxiv.org/pdf/2007.03541.pdf
khelli07 commented 8 months ago

self-assign

github-actions[bot] commented 7 months ago

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

khelli07 commented 7 months ago

Working on it now