SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Indonesia-Chinese-MTRobustEval #314

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 9 months ago

Dataloader name: indonesia_chinese_mtrobusteval/indonesia_chinese_mtrobusteval.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indonesia_chinese_mtrobusteval

Dataset indonesia_chinese_mtrobusteval
Description The dataset is curated for the purpose of evaluating the robustness of Neural Machine Translation (NMT) towards natural occuring noise (typo, slang, code switching, etc.). The dataset is crawled from Twitter, then pre-processed to obtain sentences with noise. The dataset consists of a thousand noisy sentences. The dataset is translated into Chinese manually as the benchmark for evaluating the robustness of NMT.
Subsets -
Languages ind, cmn
Tasks Machine Translation
License MIT (mit)
Homepage https://github.com/supryzhu/Indonesia-Chinese-MTRobustEval
HF URL -
Paper URL -
TysonYu commented 9 months ago

self-assign

supryzhu commented 8 months ago

self-assign

SamuelCahyawijaya commented 8 months ago

Hi @supryzhu, I am unassigning your account from the issue as is already assigned to @TysonYu. If you want to see the unassigned dataloader issues, you can check them on the Github project.

github-actions[bot] commented 8 months ago

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.