Closed muhammadravi251001 closed 2 months ago
After read the https://github.com/SEACrowd/seacrowd-datahub/issues/26, I need to change folder name to basaha_corpus
instead of basahacorpus
. So I rename the basaha_corpus
folder name and delete basahacorpus
folder. The commit is from https://github.com/SEACrowd/seacrowd-datahub/pull/606/commits/fa9ac0b4f451e43c5a559682a4c42aa73ef47d62 to https://github.com/SEACrowd/seacrowd-datahub/pull/606/commits/80a31334784bb499925afd4d26c385030d222a18.
Previously on https://github.com/SEACrowd/seacrowd-datahub/pull/606/commits/f4914c9cd334ad418e86cc650751bae0038d2f01 and https://github.com/SEACrowd/seacrowd-datahub/pull/606/commits/2c6591d341482a66c66cf8ceac4f6bec21def7f7, I accidentally delete the right folder 😅.
@muhammadravi251001 : thank you for the contribution! LGTM!
Thanks for the merge, Sir!
Title: Add Dataloader BasahaCorpus
First line PR Message: Closes https://github.com/SEACrowd/seacrowd-datahub/issues/26
Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.