SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #16 | Create dataset loader for ALT Burmese Treebank #297

Closed MJonibek closed 8 months ago

MJonibek commented 9 months ago

Closes #16

Note: PR 295 is related to this PR and issue it closes

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

MJonibek commented 8 months ago

Hi, I see, problem with unique id, I will solve this issue this week

MJonibek commented 8 months ago

@holylovenia Done. Found the reason of this bug, during creating subnodes they have their ids, and for every new sentence id for subnodes was starting from 0 again, as the result subnodes from different sentences had same ids. Made some changes for subnodes ids: instead of id 0 it will be sent_id_0. Checked manually and with test.py, everything worked.

001 002

holylovenia commented 8 months ago

Thanks for your hard work and the bug explanation, @MJonibek. 👍 It makes a lot of sense. All is well during my test and review process. I'm upping the dataloader points with a +2 bonus considering the high difficulty. Thank you for tackling this dataloader.

For now, let's wait for @sabilmakbar's review.

Re-assigning the reviewer from @sabilmakbar to @SamuelCahyawijaya to give more breathing room to @sabilmakbar. 🙏

MJonibek commented 8 months ago

@SamuelCahyawijaya , fixed this bug