facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.42k stars 6.4k forks source link

OSError: Invalid data stream #1825

Closed Edwardlzy closed 4 years ago

Edwardlzy commented 4 years ago

Hello! I am new to fairseq and trying to run the translation example. I followed this readme but encountered an OSError: Invalid data stream when running en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt').

I've tried to edit the _compression.py file and move the data = self._decompressor.decompress(rawblock, size) into a try/catch block but got no luck. Any idea how to fix it?

I used Python 3.6/3.7 and PyTorch 1.4 on Ubuntu 18.04. My Cuda version is 10.1 and Cudnn 7.1. I pip installed fairseq, sacremoses and subword_nmt. My fairseq version is 0.9.0. Any help is much appreciated!

myleott commented 4 years ago

Can you share the full stack trace? I don't think that _compression.py file is part of fairseq.

Edwardlzy commented 4 years ago

Can you share the full stack trace? I don't think that _compression.py file is part of fairseq.

Thank you for the quick response! I've attached it below

>>> import torch
>>> torch.hub.list('pytorch/fairseq')
Using cache found in /usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master
['bart.large', 'bart.large.cnn', 'bart.large.mnli', 'bpe', 'camembert.v0', 'conv.stories', 'conv.stories.pretrained', 'conv.wmt14.en-de', 'conv.wmt14.en-fr', 'conv.wmt17.en-de', 'data.stories', 'dynamicconv.glu.wmt14.en-fr', 'dynamicconv.glu.wmt16.en-de', 'dynamicconv.glu.wmt17.en-de', 'dynamicconv.glu.wmt17.zh-en', 'dynamicconv.no_glu.iwslt14.de-en', 'dynamicconv.no_glu.wmt16.en-de', 'lightconv.glu.wmt14.en-fr', 'lightconv.glu.wmt16.en-de', 'lightconv.glu.wmt17.en-de', 'lightconv.glu.wmt17.zh-en', 'lightconv.no_glu.iwslt14.de-en', 'lightconv.no_glu.wmt16.en-de', 'roberta.base', 'roberta.large', 'roberta.large.mnli', 'roberta.large.wsc', 'tokenizer', 'transformer.wmt14.en-fr', 'transformer.wmt16.en-de', 'transformer.wmt18.en-de', 'transformer.wmt19.de-en', 'transformer.wmt19.de-en.single_model', 'transformer.wmt19.en-de', 'transformer.wmt19.en-de.single_model', 'transformer.wmt19.en-ru', 'transformer.wmt19.en-ru.single_model', 'transformer.wmt19.ru-en', 'transformer.wmt19.ru-en.single_model', 'transformer_lm.gbw.adaptive_huge', 'transformer_lm.wiki103.adaptive', 'transformer_lm.wmt19.de', 'transformer_lm.wmt19.en', 'transformer_lm.wmt19.ru', 'xlmr.base', 'xlmr.large']
>>> en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
Using cache found in /usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/site-packages/torch/hub.py", line 366, in load
    model = entry(*args, **kwargs)
  File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/models/fairseq_model.py", line 218, in from_pretrained
    **kwargs,
  File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/hub_utils.py", line 52, in from_pretrained
    model_path = file_utils.load_archive_file(model_name_or_path)
  File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/file_utils.py", line 81, in load_archive_file
    top_dir = os.path.commonprefix(archive.getnames())
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 1771, in getnames
    return [tarinfo.name for tarinfo in self.getmembers()]
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 1763, in getmembers
    self._load()        # all members, we first have to
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 2350, in _load
    tarinfo = self.next()
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 2281, in next
    self.fileobj.seek(self.offset - 1)
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/bz2.py", line 274, in seek
    return self._buffer.seek(offset, whence)
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/_compression.py", line 143, in seek
    data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
myleott commented 4 years ago

Hmm, yeah, that error is strange, and I'm not able to reproduce it.

Can you try adding force_reload=True and share the entire log of what you see when you run: en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt', force_reload=True)

Edwardlzy commented 4 years ago

Seems like the error is the same.

>>> en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt', force_reload=True)
Downloading: "https://github.com/pytorch/fairseq/archive/master.zip" to /local/mnt2/workspace2/zliang/fairseq/master.zip
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/hub.py", line 366, in load
    model = entry(*args, **kwargs)
  File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/models/fairseq_model.py", line 218, in from_pretrained
    **kwargs,
  File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/hub_utils.py", line 52, in from_pretrained
    model_path = file_utils.load_archive_file(model_name_or_path)
  File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/file_utils.py", line 81, in load_archive_file
    top_dir = os.path.commonprefix(archive.getnames())
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 1769, in getnames
    return [tarinfo.name for tarinfo in self.getmembers()]
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 1761, in getmembers
    self._load()        # all members, we first have to
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 2358, in _load
    tarinfo = self.next()
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 2289, in next
    self.fileobj.seek(self.offset - 1)
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/bz2.py", line 278, in seek
    return self._buffer.seek(offset, whence)
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/_compression.py", line 143, in seek
    data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
  File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
myleott commented 4 years ago

Sorry, I really don't know what's going on. You can instead download the model file directly: https://github.com/pytorch/fairseq/tree/master/examples/translation#pre-trained-models

Then untar and load with:


from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained(
    '/path/to/model/dir',
    tokenizer='moses',
    bpe='subword_nmt'
)
Edwardlzy commented 4 years ago

This seems to work. Thank you for you help!