I tried to run scripts/summarization.py but it failed to load the data. The error is below. Looks like the md5sum is not the same as expected.
Traceback (most recent call last):
File "scripts/summarization.py", line 354, in <module>
main(args)
File "scripts/summarization.py", line 306, in main
model.hf_datasets = nlp.load_dataset('scientific_papers', 'arxiv')
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/load.py", line 549, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 463, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 522, in _download_and_prepare
self.info.download_checksums, dl_manager.get_recorded_sizes_checksums(), "dataset source files"
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/utils/info_utils.py", line 38, in verify_checksums
raise NonMatchingChecksumError(error_msg + str(bad_urls))
nlp.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download', 'https://drive.google.com/uc?id=1lvsqvsFi3W-pE1SqNZI0s8NR9rC1tsja&export=download']
I then tried to ignore verification steps by ignore_verifications=True and there is another error.
Traceback (most recent call last):
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 537, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 810, in _prepare_split
for key, record in utils.tqdm(generator, unit=" examples", total=split_info.num_examples, leave=False):
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/datasets/scientific_papers/9e4f2cfe3d8494e9f34a84ce49c3214605b4b52a3d8eb199104430d04c52cc12/scientific_papers.py", line 108, in _generate_examples
with open(path, encoding="utf-8") as f:
NotADirectoryError: [Errno 20] Not a directory: '/home/username/.cache/huggingface/datasets/downloads/c0deae7af7d9c87f25dfadf621f7126f708d7dcac6d353c7564883084a000076/arxiv-dataset/train.txt'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "scripts/summarization.py", line 354, in <module>
main(args)
File "scripts/summarization.py", line 306, in main
model.hf_datasets = nlp.load_dataset('scientific_papers', 'arxiv', ignore_verifications=True)
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/load.py", line 549, in load_dataset
download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 463, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/opt/conda/envs/longformer/lib/python3.7/site-packages/nlp/builder.py", line 539, in _download_and_prepare
raise OSError("Cannot find data file. " + (self.manual_download_instructions or ""))
OSError: Cannot find data file.
I tried to run
scripts/summarization.py
but it failed to load the data. The error is below. Looks like the md5sum is not the same as expected.I then tried to ignore verification steps by
ignore_verifications=True
and there is another error.