Closed TysonYu closed 7 months ago
Hi @TysonYu, a suggestion to change the init PR message of Closes #{ISSUE_NUMBER} so that it will be linked to the dataloader issue for coming PRs (I've done it on this one, tho).
Hi @TysonYu, a suggestion to change the init PR message of Closes #{ISSUE_NUMBER} so that it will be linked to the dataloader issue for coming PRs (I've done it on this one, tho).
Okay, will do it for later ones.
rather than having to write on
_split_generators
and re-read again in_generate_examples
, why we don't pass theall_data
list in_split_generators
gen_kwargs
and use it directly ongenerate_examples
? I think passing such is possible (see this SEACrowd Implementation)
Hey, I do by this way because it seems to be logically correct and clear. I agree your mentioned approach is another implementation and still my current approach should be fine. I think some other dataloaders also did in this way, such as indosum.
Closes #357.
Checkbox
seacrowd/sea_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
.