Closed ryanignatius closed 2 years ago
Thank you for the dataloader, @ryanignatius!
I encountered an error when I tried to use the dataloader with configuration names using the format
talpco_{src_lang}_{tgt_lang}_{schema}
. Could you please modify line 133 to_, lang_source, lang_target = self.config.name.replace(f"_{self.config.schema}", "").split("_")
to get rid of the error?
Thanks for the feedback! I have updated the code to fix the error as suggested.
Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.
Checkbox
nusantara/nusa_datasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_NUSANTARA_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneNusantaraConfig
for the source schema and one for a nusantara schema.datasets.load_dataset
function.python -m tests.test_nusantara --path=nusantara/nusa_datasets/my_dataset/my_dataset.py
.