d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
https://D2L.ai
Other
22.45k stars 4.19k forks source link

WikiText-2 is not a zip file #2588

Open CharryLee0426 opened 4 months ago

CharryLee0426 commented 4 months ago

When I executed the following part:

from d2l import torch as d2l

batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)
from d2l import mxnet as d2l

batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)

I met this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 2443, in load_data_wiki
    data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 3247, in download_extract
    fp = zipfile.ZipFile(fname, 'r')
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I think it is because the dataset in the server has been damaged. I reimplemented this error with d2l 1.0.0 - 1.0.3. And it will cause some errors when WikiText-2 dataset is needed.

I have a pull request failed due to this error. I also mentioned that there are some pull requests related fixing typo errors also failed check due to this error.

I hope this error can be fixed as soon as possible.

CharryLee0426 commented 4 months ago

The wikitext-2 dataset URL returns this error:

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>MM9XHEKPABYT4NPW</RequestId>
<HostId>KOjOK6r2VNkvN6gS28B7s2akq8hULUJohhsiCnyrL9RMzjk3RAIvYnVZiHGd6PPVEIDnQHTijnI=</HostId>
</Error>
donny-nyc commented 2 months ago

Having the same issue. Is there an updated URL we can use?

MassEast commented 2 weeks ago

Same issue here. According the book, the dataset is from

Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. ArXiv:1609.07843.

In that paper, http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ is linked and this site can't be reached anymore. Hence, likewise https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip isn't anymore. Anyone has a good mirror for this?