Closed srulikbd closed 2 years ago
Works fine on WSL/Windows 10. it seems this error is related to datasets
, not promptsource
because there are other people facing similar issues https://github.com/huggingface/datasets/issues/3269
I'm trying to view sent_comp for that current sprint.. but I get the fllowing error:
NonMatchingChecksumError: Checksums didn't match for dataset source files: ['https://github.com/google-research-datasets/sentence-compression/raw/master/data/sent-comp.train03.json.gz'] Traceback: File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script exec(code, module.__dict__) File "/home/srulikbd/promptsource/promptsource/app.py", line 259, in <module> dataset = get_dataset(dataset_key, str(conf_option.name) if conf_option else None) File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func return get_or_create_cached_value() File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/caching.py", line 557, in get_or_create_cached_value return_value = func(*args, **kwargs) File "/home/srulikbd/promptsource/promptsource/utils.py", line 49, in get_dataset builder_instance.download_and_prepare() File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/builder.py", line 608, in download_and_prepare dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/builder.py", line 680, in _download_and_prepare self.info.download_checksums, dl_manager.get_recorded_sizes_checksums(), "dataset source files" File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums raise NonMatchingChecksumError(error_msg + str(bad_urls))
I'm running the last main promptsource from source, on WSL 2, windows 11, python 3.7. I succeed viewing other datasets easily.
i suspect something went wrong during the download: the size of the download does not match its expected value... could you try to remove the cache and re-download?
I tried delete and downloading again but the same error appears
i just tried again and couldn't reproduce... could you a lil' more details about your setup?
Could you try a load_dataset("sent_comp", download_mode="force_redownload")
?
I also tried and it worked. Yet, AFAIK, GitHub had many incidents recently. I encountered two different symptoms with c4, but their root cause seems network or file corruption (git-lfs)?
ok, with @VictorSanh suggestion it now working! thanks.
I'm trying to view sent_comp for that current sprint.. but I get the fllowing error:
NonMatchingChecksumError: Checksums didn't match for dataset source files: ['https://github.com/google-research-datasets/sentence-compression/raw/master/data/sent-comp.train03.json.gz'] Traceback: File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script exec(code, module.__dict__) File "/home/srulikbd/promptsource/promptsource/app.py", line 259, in <module> dataset = get_dataset(dataset_key, str(conf_option.name) if conf_option else None) File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func return get_or_create_cached_value() File "/home/srulikbd/.local/lib/python3.7/site-packages/streamlit/caching.py", line 557, in get_or_create_cached_value return_value = func(*args, **kwargs) File "/home/srulikbd/promptsource/promptsource/utils.py", line 49, in get_dataset builder_instance.download_and_prepare() File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/builder.py", line 608, in download_and_prepare dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/builder.py", line 680, in _download_and_prepare self.info.download_checksums, dl_manager.get_recorded_sizes_checksums(), "dataset source files" File "/home/srulikbd/.local/lib/python3.7/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums raise NonMatchingChecksumError(error_msg + str(bad_urls))
I'm running the last main promptsource from source, on WSL 2, windows 11, python 3.7. I succeed viewing other datasets easily.