Open TysonYu opened 7 months ago
A friendly reminder to follow up, @TysonYu @raileymontalan.
Hi @TysonYu, could you please fix the folder name to leipzig_corpora (i.e.
seacrowd/sea_datasets/leipzig_corpora/leipzig_corpora.py
? And provide per-language subsets. Other than that, the code LGTM. Thanks! Done~
Hi @raileymontalan and @SamuelCahyawijaya, I changed the "copora" to "corpora". Please feel free to let @TysonYu know if other changes are required.
Hi @TysonYu, are you working on creating subsets per language, as per @SamuelCahyawijaya's request?
Hi @TysonYu, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) by 30 May, so it'd be great if we could wrap up the reviewing and merge this PR before then.
Hi @TysonYu, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.
Hi @TysonYu, thank you for contributing to SEACrowd! I would like to let you know that we are still looking forward to completing this PR (and dataloader issues) and maintaining SEACrowd Data Hub. We hope to enable access to as many standardized dataloaders as possible for SEA datasets. ☺️
Feel free to continue the PR whenever you're available, and if you would like to re-assign this dataloader to someone else, just let us know and we can help. 💪
Thanks again!
cc: @SamuelCahyawijaya @raileymontalan
Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.
Checkbox
seacrowd/sea_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
.