IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.

Apache License 2.0

261 stars 61 forks source link

Create dataset loader for IndoCollex #108

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 2 years ago

https://indonlp.github.io/nusa-catalogue/card.html?indocollex

haryoa commented 2 years ago

self-assign

bryanwilie commented 2 years ago

Hi @haryoa, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!

bryanwilie commented 2 years ago

Cleared assignees due to inactivity. Issue is still open for contribution!

fhudi commented 2 years ago

This dataset is morphology-related task, based on #44, there is no supported nusantara task for now. However, for this dataset, Nusantara Text Pairs Schema could support this as this dataset splits multiple inflections. i.e.: A is not in the dataset (while B and C are) as illustrated follows:

#	Text1	Text2	Transformation
A	teman-teman	temen2	space-dash, sound-alter
B	teman-teman	teman2	space-dash
C	teman2	temen2	sound-alter

Shall I implement this dataset with PAIRS and source schema? tag: @holylovenia

fhudi commented 2 years ago

self-assign

bryanwilie commented 1 year ago

Tagging @holylovenia since she might missed this

holylovenia commented 1 year ago

Hi @fhudi, sorry for the long wait, I missed this one before. I think your idea of using the pairs schema is great. Give me a minute to add a little tweak to the config.

holylovenia commented 1 year ago

The pairs_multi schema and Tasks.MORPHOLOGICAL_INFLECTION are ready to use, @fhudi! Also, this decision change also calls for a modification in #44 and #156. Would you mind incorporating the proper changes to these dataloaders? Maybe in a new PR? That way it can count as an extra contribution for your Hacktoberfest milestone in case you're joining. I can create an issue for this fix later. No pressure if you prefer not to, though. 😄

Thanks again for your wonderful suggestion!

PS: Also thanks to @bryanwilie for the kind reminder and help.

fhudi commented 1 year ago

@holylovenia Surebeans, I will do the modification 😄