Closed phlobo closed 1 month ago
@phlobo I cannot push mantra_gsc
to the hub because of the following error:
- "language[0]" with value "en, fr, de, nl, es" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.
Could you please open a PR to fix this?
@leonweber thank you for taking care of this, please see #923
Closes #891
Mantra GSC was moved from the original website to GitHub: https://github.com/mi-erasmusmc/Mantra-Gold-Standard-Corpus/tree/main
This PR makes the loader point to the new URL and creates a HF Hub version of the existing loader script for
mantra_gsc
.If the following information is NOT present in the issue, please populate:
Checkbox
hub/hub_repos/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.datasets.load_dataset
function.python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local
.