SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
66 stars 58 forks source link

Create dataset loader for IndoSMD #54

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 11 months ago

Dataloader name: indosmd/indosmd.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indosmd

Dataset indosmd
Description IndoSMD is a synthetic task-oriented dialogue system dataset that was translated from the In-Car Assistant (SMD) dataset (Eric et al., 2017) into the new Indonesian dataset using the translation pipeline method including delexicalization, translation, and delexicalization. The dataset consists of 323 dialogues in the POI Navigation, Calendar Scheduling, and Weather Information Retrieval domain, with a user and an agent talking to each other. It also consists of slots and dialogue acts from the user and the agent.
Subsets -
Languages ind
Tasks Dialogue System
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://github.com/dehanalkautsar/IndoToD/tree/main/IndoSMD
HF URL -
Paper URL https://arxiv.org/pdf/2311.00958.pdf
dehanalkautsar commented 11 months ago

self-assign

sabilmakbar commented 11 months ago

Hi @dehanalkautsar, may I know the current status of this dataloader creation? Feel free to discuss here if you have any difficulties. Thanks!

github-actions[bot] commented 10 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

dehanalkautsar commented 10 months ago

I’ll create all the dataloaders assigned to me this week

holylovenia commented 10 months ago

Okay then, @dehanalkautsar. Feel free to let us know if you need any help!