Closed SamuelCahyawijaya closed 1 year ago
Hi @fhudi, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!
Hi @bryanwilie thanks for asking, I discussed this with @gentaiscool and @afaji long time ago, based on the discussion it seems new schema might be required. I thought there would similar situations for other datasets and was expecting a flow of proposing new schema to be released. But it does not seem to be case as your comment suggested otherwise. Is there any way to propose new schema or should I just create new schema alongside?
Noted @fhudi. Looping @holylovenia since it's related to proposing for new schema.
Thank you for joining us by the way!
Hello @fhudi, thank you for waiting. For the nusantara
schema, could you please use the t2t
schema (and Tasks.PARAPHRASING
) with the form
as the text1
and the lemma
as text2
? For the source
schema, you can implement it according to the original dataset structure, so the features will be: lemma
: string
, form
: string
, tag
: [string]
. Please let me know if you have any questions.
Hi @holylovenia, thanks for the reply.
Sorry but I don't quite get it, could you please elaborate more 🙏
t2t
schema does not have tag
field for the crucial inflection element, CMIIW.Let's take an example as follows.
Following the paraphrasing task, in this particular example, same input text abdi
has 2 different outputs [abdinya
, mengabdi
], is this fine?
I believe what you mentioned was specifically for Morphological Analysis
task minus the inflection part becoming Paraphrasing
task as a result.
And what about Morphological Inflection
task, i.e.: (in) abdi
['V', 'ACT']
→ (out) mengabdi
, are we not going to support these morphological tasks in Nusantara?
Hi @fhudi, thank you for waiting and explaining. What you said is right, it is quite inaccurate to frame this morphological inflection as a paraphrasing task. However, so far there hasn't been a demand for this schema structure aside from this dataloader, so we decide to leave the nusantara
schema out of it for now. Please implement the source
schema only. Thanks again! :smile:
@holylovenia Noted and thanks, will do so 😄 @afaji FYI, morphology-related task won't be implemented for now 🙏
Hi kak @fhudi sorry I reopened it because just missed one thing. Can you change the location of unimorph_id.py
file to nusacrowd/nusa_datasets/unimorph_id/unimorph_id.py
? Because another datasets had been moved to that location too. Can raise another PR again to fix it. Thank you!
Hi kak @fhudi sorry I reopened it because just missed one thing. Can you change the location of
unimorph_id.py
file tonusacrowd/nusa_datasets/unimorph_id/unimorph_id.py
? Because another datasets had been moved to that location too. Can raise another PR again to fix it. Thank you!
Closed again since it had been resolved in https://github.com/IndoNLP/nusa-crowd/commit/3df7f5b8e89112cc17d4a7466441ac41e9b8fa87 by @holylovenia
https://indonlp.github.io/nusa-catalogue/card.html?unimorph_id