SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Closes #527 | Add Dataloader MDIA #562

Closed akhdanfadh closed 2 months ago

akhdanfadh commented 3 months ago

Closes #527

I implemented one config per language/subset. Thus, configs will look like this: mdia_ind_dialogue_source, mdia_tgl_eng_seacrowd_t2t, etc. When testing, pass mdia_<subset> to the --subset_id parameter.

Note: this dataset can be used for two tasks and both use the same t2t schema with different implementation.

Checkbox

akhdanfadh commented 3 months ago

@MJonibek @danjohnvelasco Done reformatting!

holylovenia commented 2 months ago

I'm merging this since both @MJonibek and @danjohnvelasco's reviews have been addressed. Thank you so much for your work, @akhdanfadh!! 👍