SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for MTOP #81

Closed SamuelCahyawijaya closed 10 months ago

SamuelCahyawijaya commented 12 months ago

Dataloader name: mtop/mtop.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?mtop

Dataset mtop
Description An almost-parallel multilingual task-oriented semantic parsing dataset covering 6 languages and 11 domains. This is the first multilingual dataset which contains compositional representations that allow complex nested queries.
Subsets Domain, Intent
Languages tha
Tasks Constituency Parsing, Semantic Role Labeling, Intent Classification
License Unknown (unknown)
Homepage https://huggingface.co/datasets/mteb/mtop_domain, https://huggingface.co/datasets/mteb/mtop_intent
HF URL https://huggingface.co/datasets/mteb/mtop_domain, https://huggingface.co/datasets/mteb/mtop_intent
Paper URL https://arxiv.org/abs/2008.09335
elyanah-aco commented 11 months ago

self-assign

sabilmakbar commented 11 months ago

Hi @elyanah-aco, may I know the current status of this dataloader creation? Feel free to discuss here if you have any difficulties. Thanks!

elyanah-aco commented 11 months ago

@sabilmakbar Still working on my other 2 dataloaders, will get to this once those are done

sabilmakbar commented 11 months ago

Cool, thanks for letting us know! Please take your time working w/ other dataloaders first and let us know if you find any roadblocks!

elyanah-aco commented 11 months ago

Hi @sabilmakbar, what should I do if there are two subsets with different labels for the same schema (domain and intent under text schema)?

sabilmakbar commented 11 months ago

You may separate them by implementing per config; as a consequence you'll have two times the usual configs (for domain and intent)