alex-berard / seq2seq

Attention-based sequence to sequence learning
Apache License 2.0
388 stars 122 forks source link

How can I use the --align option? #16

Closed Lindsay125 closed 6 years ago

Lindsay125 commented 6 years ago

I have run --train, --decode, --eval options successfully. However, when I run --align option using the command as follows, I can get the attention heatmap but it isn't correct because it is with the reference file not with the prediction file. The align option didn't do the translation. Here is my command: python -m translate experiments/WMT14/substr.yaml \ --align experiments/WMT14/data/bpe_en-de/test2.en \ experiments/WMT14/data/bpe_en-de/test2.de \ --checkpoints experiments/WMT14/beam10_en-de/checkpoints/best \ --output experiments/WMT14/data/bpe_en-de/test2.raw \ --beam-size 1 \ --gpu-id 0

I want to know if I use the wrong command. Thank you very much!

alex-berard commented 6 years ago

Hello,

When used alone, the '--align' option displays a forced alignment. This is useful when you want to see how the alignment model behaves during training, or to use the model for sentence alignment.

If you want to display a non-forced alignment (i.e., alignment with respect to the decoding output, and not the reference). You can use the '--decode' option along with '--align' (without any argument).

For example:

./seq2seq.sh experiments/WMT14/substr.yaml \
--decode experiments/WMT14/data/bpe_en-de/test2.en --align \
--checkpoints experiments/WMT14/beam10_en-de/checkpoints/best \
--output experiments/WMT14/data/bpe_en-de/test2.raw
Lindsay125 commented 6 years ago

Thank you for your response. I run the command exactly as you said to display a non-forced alignment, and I only get a translated file but not the attention heatmap 'svg' file.

alex-berard commented 6 years ago

Is your seq2seq repository up to date? This feature is fairly new.

Lindsay125 commented 6 years ago

I update the repository and my old model is not compatible with it (raising error: theInvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [500,36617] rhs shape= [620,36617] ). It is strange because I didn't set any shape of 500. Maybe the new repository uses a different way to load the parameters and checkpoints. So, I retrained a new model with the new code, and it works(both translate and align). However, the heatmap only displayed the prediction sentence without the source sentence. There is an example. test1.heatmap.1.pdf

alex-berard commented 6 years ago

The first error is probably due to the parameter embed_proj. The default value has changed to False. You just need to add embed_proj: True to your model's configuration and it should work again.

The alignment problem is probably due to a bug that was fixed in a recent commit. You can pull the latest version, or change self.binary by self.binary[0] at line 190 in translate/translation_model.py.

Lindsay125 commented 6 years ago

Thank you very much! My problems have been all solved.