ExpressAI / DataLab

The unified platform for data-related resources.
https://expressai.github.io/DataLab/
Apache License 2.0
131 stars 27 forks source link

dureader_search dataset is broken #401

Open neubig opened 1 year ago

neubig commented 1 year ago
>>> datalabs.load_dataset("dureader_search", "question_answering_reading_comprehension")
Couldn't find a directory or a dataset named 'dureader_search' in this version. It was picked from the master branch on github instead.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/load.py", line 2144, in load_dataset
    builder_instance.download_and_prepare(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/builder.py", line 747, in download_and_prepare
    self._download_and_prepare(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/builder.py", line 844, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/Users/gneubig/.cache/expressai/modules/datasets_modules/datalab/dureader_search/312f763b2731ab2a5f73289845a16ca51676422791f3a867a0ff3853d2dce40c/dureader_search.py", line 136, in _split_generators
    train_path = dl_manager.download_and_extract(_TRAIN_DOWNLOAD_URL)
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/download_manager.py", line 322, in download_and_extract
    return self.extract(self.download(url_or_urls))
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/download_manager.py", line 221, in download
    downloaded_path_or_paths = map_nested(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/py_utils.py", line 297, in map_nested
    return function(data_struct)
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/download_manager.py", line 248, in _download
    return cached_path(url_or_filename, download_config=download_config)
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/file_utils.py", line 344, in cached_path
    output_path = get_from_cache(
  File "/Users/gneubig/opt/anaconda3/envs/explainaboard_web/lib/python3.10/site-packages/datalabs/utils/file_utils.py", line 717, in get_from_cache
    raise FileNotFoundError(f"Couldn't find file at {url}")
FileNotFoundError: Couldn't find file at http://cdatalab1.oss-cn-beijing.aliyuncs.com/question_answering/dureader_search/train_revised.json