SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #219 | Create dataloader for scb-mt-en-th-2020 #287

Closed jensan-1 closed 8 months ago

jensan-1 commented 9 months ago

Please name your PR after the issue it closes. You can use the following line: "Closes #219 " where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

jensan-1 commented 9 months ago

Hello reviewers,

In addition to the PR, I want to let you know (as reported in the comment section of issue #219) that the DataCatalogue link cannot be opened from here. Instead, I found this link works for this dataset: https://seacrowd.github.io/seacrowd-catalogue/card.html?scb-mt-en-th-2020.

Therefore, one clarification: Should the dataset name be scb-mt-en-th or scb-mt-en-th-2020? (UPDATE: it will be implemented with the snakecase dataloader name) I think the dataset name reported in the title, card, and dataloader name should be unified. Please let me know which dataset name is the correct one.

holylovenia commented 9 months ago

Hello reviewers,

In addition to the PR, I want to let you know (as reported in the comment section of issue #219) that the DataCatalogue link cannot be opened from here. Instead, I found this link works for this dataset: https://seacrowd.github.io/seacrowd-catalogue/card.html?scb-mt-en-th-2020.

Therefore, one clarification: Should the dataset name be scb-mt-en-th or scb-mt-en-th-2020? (UPDATE: it will be implemented with the snakecase dataloader name) I think the dataset name reported in the title, card, and dataloader name should be unified. Please let me know which dataset name is the correct one.

Hi @jen-santoso, let's stick with scb_mt_en_th. I'll run this by @SamuelCahyawijaya to see what triggers the incorrect URL being generated.