Closed elyanah-aco closed 7 months ago
Hi @elyanah-aco, Would you like to create another PR for the config addition for code-switching? Thanks!
for clarification:
to test this dataloader into SEACrowd testcases, we have to use --schema
args too aside from --subset_id
.
Tested using these:
python -m tests.test_seacrowd seacrowd/sea_datasets/codeswitch_reddit/codeswitch_reddit.py --subset_id codeswitch_reddit_cs --schema TEXT_MULTI
python -m tests.test_seacrowd seacrowd/sea_datasets/codeswitch_reddit/codeswitch_reddit.py --subset_id codeswitch_reddit_eng_monolingual --schema SSP
Closes #356.
Notes:
CODE_SWITCHING_IDENTIFICATION
that uses theseacrowd_text_multi
schema and takes on language codes as labels.cs
andeng_monolingual
. Thecs
subset uses theseacrowd_text_multi
schema andeng_monolingual
usesseacrowd_ssp
.Checkbox
seacrowd/sea_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
.