INK-USC / CrossFit

Code for paper "CrossFit :weight_lifting:: A Few-shot Learning Challenge for Cross-task Generalization in NLP" (https://arxiv.org/abs/2104.08835)
105 stars 6 forks source link

URL invalid for some datasets: #2

Closed shmsw25 closed 3 years ago

shmsw25 commented 3 years ago

Hi, thank you for such a great paper & resources! I just wanted to report that downloading some of datasets using scripts in tasks/ does not work, presumably because the dataset urls got invalid by the original host. In particular, here is the list of datasets that gave errors due to invalid urls.

cherry979988 commented 3 years ago

Thank you for raising this! Bill (@yuchenlin) and I will try to find some workaround.

cherry979988 commented 3 years ago

Hi Sewon,

I'm trying to reproduce this issue but my scripts are working as expected. Could you please provide some extra information for us? Thank you.

  1. What are the error messages you're getting?
  2. Could you double-check if your huggingface dataset has version 1.4.0 and could you please try the scripts again after clearing the cache?

Attaching my logs for reference.

Screen Shot 2021-08-23 at 11 51 56 AM
shmsw25 commented 3 years ago

Hi @cherry979988, thank you for your help. Yes, I double-checked that the HF datasets version is 1.4.0, and the error is keep occurring after clearing the cache. Error messages are saved here.

P.S. I think if you have downloaded the data once, the data is saved as a cache. Perhaps that is why you were not able to reproduce the error?

cherry979988 commented 3 years ago

Hi @shmsw25

Thank you for providing the logs. I am able to reproduce the errors. a

My guess is that the dataset owners updated their files, and the checksums in HF datasets is not yet updated, so we're getting this checksum error.

A temporary solution will be using ignore_verifications=True when loading datasets (e.g., dataset = load_dataset("kilt_tasks", "wow", ignore_verifications=True)). However, this will probably leads to differences in few-shot sampling. I'll discuss with Bill and see if there is a better solution...

shmsw25 commented 3 years ago

Got it, thank you for taking a look at this!

slyviacassell commented 3 years ago

@cherry979988 Would you mind sharing your cache of the following for the unavailable network?