SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #165 | Add BLOOM-LM dataset #294

Closed sabilmakbar closed 7 months ago

sabilmakbar commented 9 months ago

Closes #165

Checkbox

sabilmakbar commented 9 months ago

Test log of Bloom LM (.log file attached): test_bloom_lm.log

jcblaisecruz02 commented 8 months ago

@holylovenia @jamesjaya Hello! I've spoken with our linguists and they've advised that it's better to label it as psp as it is its own distinct language, and is treated separately no matter what Philippine language it is aligned with in a parallel corpus. A sign language to Cebuano dataset would therefore be psp-ceb.

holylovenia commented 8 months ago

Hi @sabilmakbar, please let us know if you need help with the dataloader or if you still have some questions.

sabilmakbar commented 7 months ago

Hi @sabilmakbar, please let us know if you need help with the dataloader or if you still have some questions.

Hi @holylovenia, thanks for the reminder. I apologize since I haven't been able to find a time to address this. Will address the suggestion from you and @jamesjaya; please expect it to be finished by tomorrow morning.

sabilmakbar commented 7 months ago

Done updating, pls have a check and let me know if the suggested changes have been implemented