Closed akhdanfadh closed 1 month ago
Hi @patrickamadeus, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.
cc: @akhdanfadh
cc: @sabilmakbar
Closes #590
I implemented one config per subset. Thus, configs will look like this:
thai_elderly_speech_healthcare_source
,thai_elderly_speech_smarthome_seacrowd_sptext
, etc. When testing, passthai_elderly_speech_<subset>
to the--subset_id
parameter.Checkbox
seacrowd/sea_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
.