MD5 Checksum failure - Githubissues

Hi, so after resolving the encoding issue, I'm still getting a few errors with the most recent code on the following datasets:

Please try the following tasks later by running individual files: ['multi_news.py', 'reddit_tifu.py', 'search_qa.py', 'amazon_polarity.py', 'spider.py', 'jeopardy.py', 'gigaword.py', 'wiki_auto.py', 'wiki_bio.py', 'yahoo_answers_topics.py', 'yelp_review_full.py', 'dbpedia_14.py', 'definite_pronoun_resolution.py', 'kilt_wow.py']

When I try to run a few of them, they output the following:

(crossfit) > python multi_news.py
Using custom data configuration default
Downloading and preparing dataset multi_news/default (download: 245.06 MiB, generated: 667.74 MiB, post-processed: Unknown size, total: 912.80 MiB) to /home/ABCD/.cache/huggingface/datasets/multi_news/default/1.0.0/465b14e19b4d6a55c9bb9131ca1de642175872143c9b231bee1dce789311b449...
Traceback (most recent call last):
  File "multi_news.py", line 32, in <module>
    main()
  File "multi_news.py", line 29, in main
    train, dev, test = dataset.generate_k_shot_data(k=32, seed=seed, path="../data/")
  File "/scratch/ABCD/CrossFit/tasks/fewshot_gym_dataset.py", line 79, in generate_k_shot_data
    dataset = self.load_dataset()
  File "multi_news.py", line 23, in load_dataset
    return datasets.load_dataset('multi_news')
  File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/load.py", line 746, in load_dataset
    use_auth_token=use_auth_token,
  File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/builder.py", line 579, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
  File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/builder.py", line 639, in _download_and_prepare
    self.info.download_checksums, dl_manager.get_recorded_sizes_checksums(), "dataset source files"
  File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/utils/info_utils.py", line 39, in verify_checksums
    raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C']

Running a curl on the URL yields:

<html lang=en><meta charset=utf-8><meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"><title>Error 400 (Bad Request)!!1</title><style nonce="SpqF3pAZ+9nngUOG9GU6Gg">*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{color:#222;text-align:unset;margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px;}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}pre{white-space:pre-wrap;}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}</style><main id="af-error-container" role="main"><a href=//www.google.com><span id=logo aria-label=Google role=img></span></a><p><b>400.</b> <ins>That’s an error.</ins><p>The server cannot process the request because it is malformed. It should not be retried. <ins>That’s all we know.</ins></main>

On the README file it says that Google Drive has a quota for daily download, but this error message looks like there may be something else going on.

INK-USC / CrossFit

MD5 Checksum failure #7