Closed Edwardlzy closed 4 years ago
Can you share the full stack trace? I don't think that _compression.py file is part of fairseq.
Can you share the full stack trace? I don't think that _compression.py file is part of fairseq.
Thank you for the quick response! I've attached it below
>>> import torch
>>> torch.hub.list('pytorch/fairseq')
Using cache found in /usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master
['bart.large', 'bart.large.cnn', 'bart.large.mnli', 'bpe', 'camembert.v0', 'conv.stories', 'conv.stories.pretrained', 'conv.wmt14.en-de', 'conv.wmt14.en-fr', 'conv.wmt17.en-de', 'data.stories', 'dynamicconv.glu.wmt14.en-fr', 'dynamicconv.glu.wmt16.en-de', 'dynamicconv.glu.wmt17.en-de', 'dynamicconv.glu.wmt17.zh-en', 'dynamicconv.no_glu.iwslt14.de-en', 'dynamicconv.no_glu.wmt16.en-de', 'lightconv.glu.wmt14.en-fr', 'lightconv.glu.wmt16.en-de', 'lightconv.glu.wmt17.en-de', 'lightconv.glu.wmt17.zh-en', 'lightconv.no_glu.iwslt14.de-en', 'lightconv.no_glu.wmt16.en-de', 'roberta.base', 'roberta.large', 'roberta.large.mnli', 'roberta.large.wsc', 'tokenizer', 'transformer.wmt14.en-fr', 'transformer.wmt16.en-de', 'transformer.wmt18.en-de', 'transformer.wmt19.de-en', 'transformer.wmt19.de-en.single_model', 'transformer.wmt19.en-de', 'transformer.wmt19.en-de.single_model', 'transformer.wmt19.en-ru', 'transformer.wmt19.en-ru.single_model', 'transformer.wmt19.ru-en', 'transformer.wmt19.ru-en.single_model', 'transformer_lm.gbw.adaptive_huge', 'transformer_lm.wiki103.adaptive', 'transformer_lm.wmt19.de', 'transformer_lm.wmt19.en', 'transformer_lm.wmt19.ru', 'xlmr.base', 'xlmr.large']
>>> en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
Using cache found in /usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/site-packages/torch/hub.py", line 366, in load
model = entry(*args, **kwargs)
File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/models/fairseq_model.py", line 218, in from_pretrained
**kwargs,
File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/hub_utils.py", line 52, in from_pretrained
model_path = file_utils.load_archive_file(model_name_or_path)
File "/usr2/zhiylian/.cache/torch/hub/pytorch_fairseq_master/fairseq/file_utils.py", line 81, in load_archive_file
top_dir = os.path.commonprefix(archive.getnames())
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 1771, in getnames
return [tarinfo.name for tarinfo in self.getmembers()]
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 1763, in getmembers
self._load() # all members, we first have to
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 2350, in _load
tarinfo = self.next()
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/tarfile.py", line 2281, in next
self.fileobj.seek(self.offset - 1)
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/bz2.py", line 274, in seek
return self._buffer.seek(offset, whence)
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/_compression.py", line 143, in seek
data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
File "/local/mnt2/workspace2/zliang/anaconda3/envs/tensorflow/lib/python3.7/_compression.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
Hmm, yeah, that error is strange, and I'm not able to reproduce it.
Can you try adding force_reload=True
and share the entire log of what you see when you run: en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt', force_reload=True)
Seems like the error is the same.
>>> en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt', force_reload=True)
Downloading: "https://github.com/pytorch/fairseq/archive/master.zip" to /local/mnt2/workspace2/zliang/fairseq/master.zip
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/hub.py", line 366, in load
model = entry(*args, **kwargs)
File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/models/fairseq_model.py", line 218, in from_pretrained
**kwargs,
File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/hub_utils.py", line 52, in from_pretrained
model_path = file_utils.load_archive_file(model_name_or_path)
File "/local/mnt2/workspace2/zliang/fairseq/pytorch_fairseq_master/fairseq/file_utils.py", line 81, in load_archive_file
top_dir = os.path.commonprefix(archive.getnames())
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 1769, in getnames
return [tarinfo.name for tarinfo in self.getmembers()]
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 1761, in getmembers
self._load() # all members, we first have to
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 2358, in _load
tarinfo = self.next()
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/tarfile.py", line 2289, in next
self.fileobj.seek(self.offset - 1)
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/bz2.py", line 278, in seek
return self._buffer.seek(offset, whence)
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/_compression.py", line 143, in seek
data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
File "/local/mnt2/workspace2/zliang/anaconda3/envs/fairseq/lib/python3.6/_compression.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
Sorry, I really don't know what's going on. You can instead download the model file directly: https://github.com/pytorch/fairseq/tree/master/examples/translation#pre-trained-models
Then untar and load with:
from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained(
'/path/to/model/dir',
tokenizer='moses',
bpe='subword_nmt'
)
This seems to work. Thank you for you help!
Hello! I am new to fairseq and trying to run the translation example. I followed this readme but encountered an
OSError: Invalid data stream
when runningen2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
.I've tried to edit the
_compression.py
file and move thedata = self._decompressor.decompress(rawblock, size)
into a try/catch block but got no luck. Any idea how to fix it?I used Python 3.6/3.7 and PyTorch 1.4 on Ubuntu 18.04. My Cuda version is 10.1 and Cudnn 7.1. I pip installed
fairseq
,sacremoses
andsubword_nmt
. My fairseq version is 0.9.0. Any help is much appreciated!