Closed SamuelCahyawijaya closed 1 year ago
Hi @haryoa, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!
Cleared assignees due to inactivity. Issue is still open for contribution!
This dataset is morphology-related task, based on #44, there is no supported nusantara task for now.
However, for this dataset, Nusantara Text Pairs Schema
could support this as this dataset splits multiple inflections.
i.e.: A
is not in the dataset (while B
and C
are) as illustrated follows:
# | Text1 | Text2 | Transformation |
---|---|---|---|
A | teman-teman | temen2 | space-dash, sound-alter |
B | teman-teman | teman2 | space-dash |
C | teman2 | temen2 | sound-alter |
Shall I implement this dataset with PAIRS
and source
schema?
tag: @holylovenia
Tagging @holylovenia since she might missed this
Hi @fhudi, sorry for the long wait, I missed this one before. I think your idea of using the pairs
schema is great. Give me a minute to add a little tweak to the config.
The pairs_multi
schema and Tasks.MORPHOLOGICAL_INFLECTION
are ready to use, @fhudi!
Also, this decision change also calls for a modification in #44 and #156. Would you mind incorporating the proper changes to these dataloaders? Maybe in a new PR? That way it can count as an extra contribution for your Hacktoberfest milestone in case you're joining. I can create an issue for this fix later. No pressure if you prefer not to, though. 😄
Thanks again for your wonderful suggestion!
PS: Also thanks to @bryanwilie for the kind reminder and help.
@holylovenia Surebeans, I will do the modification 😄
https://indonlp.github.io/nusa-catalogue/card.html?indocollex