SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #339 | Update dataloader for Leipzig #483

Open TysonYu opened 7 months ago

TysonYu commented 7 months ago

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

holylovenia commented 6 months ago

A friendly reminder to follow up, @TysonYu @raileymontalan.

TysonYu commented 6 months ago

Hi @TysonYu, could you please fix the folder name to leipzig_corpora (i.e. seacrowd/sea_datasets/leipzig_corpora/leipzig_corpora.py? And provide per-language subsets. Other than that, the code LGTM. Thanks! Done~

holylovenia commented 5 months ago

Hi @raileymontalan and @SamuelCahyawijaya, I changed the "copora" to "corpora". Please feel free to let @TysonYu know if other changes are required.

raileymontalan commented 5 months ago

Hi @TysonYu, are you working on creating subsets per language, as per @SamuelCahyawijaya's request?

holylovenia commented 5 months ago

Hi @TysonYu, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) by 30 May, so it'd be great if we could wrap up the reviewing and merge this PR before then.

holylovenia commented 4 months ago

Hi @TysonYu, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.

holylovenia commented 3 months ago

Hi @TysonYu, thank you for contributing to SEACrowd! I would like to let you know that we are still looking forward to completing this PR (and dataloader issues) and maintaining SEACrowd Data Hub. We hope to enable access to as many standardized dataloaders as possible for SEA datasets. ☺️

Feel free to continue the PR whenever you're available, and if you would like to re-assign this dataloader to someone else, just let us know and we can help. 💪

Thanks again!

cc: @SamuelCahyawijaya @raileymontalan