facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.3k stars 6.39k forks source link

Question about abstractive summarization #928

Closed danyang-rainbow closed 5 years ago

danyang-rainbow commented 5 years ago

After I run this python $FAIRSEQ/generate.py . --path cnn_dailymail.pt --remove-bpe --gen-subset test \ --batch-size 1 --min-len 60 --no-repeat-ngram 3 | tee cnn_dailymail.out

in my code try to generate summay, I got an exception:

Exception: Could not infer language pair, please provide it explicitly

The instruction is from https://github.com/pytorch/fairseq/tree/bi_trans_lm/examples/pretraining

It seems that LM_Data is needed. But this is the generate case, not the training case.

teslacool commented 5 years ago

you should add -s source -t target

danyang-rainbow commented 5 years ago

you should add -s source -t target

同学你好,我按你说的加了-s source -t target,又出现了找不到dict.source.txt的错误,应该是缺少字典。 FileNotFoundError: [Errno 2] No such file or directory: './dict.source.txt'

然而貌似在这个repo里面并没有字典,请问在哪可以找到整个字典呢?

teslacool commented 5 years ago

In part To generate using the pre-trained model, you can download this binarized test set and dict (curl https://dl.fbaipublicfiles.com/fairseq/models/pretraining/cnn_dailymail.tar.gz | tar xvzf -).

danyang-rainbow commented 5 years ago

In part To generate using the pre-trained model, you can download this binarized test set and dict (curl https://dl.fbaipublicfiles.com/fairseq/models/pretraining/cnn_dailymail.tar.gz | tar xvzf -).

Actually, I have already run that curl command before I do the abstractive summerization, but I didn't get the dict file, I am trying this again. Thanks a lot.

danyang-rainbow commented 5 years ago

In part To generate using the pre-trained model, you can download this binarized test set and dict (curl https://dl.fbaipublicfiles.com/fairseq/models/pretraining/cnn_dailymail.tar.gz | tar xvzf -).

Because of the network failure, I did't get the dict last time. Now I have gotten the dict file. Thx Thx