Add alignment matrix learning

iPRET commented 6 months ago

Added alignment matrix learning to training. See issue #1105 for more details. Added command line arguments --alignment-matrix for specifying a file to read alignments from. --alignment-matrix-weight for specifying loss coefficient for alignment matrix cross entropy. --attention-alignment-layer for specifying layer of decoder in which attention alignment will happen. --align-attentions for telling model to learn alignments when that's impossible to infer from other command line arguments. --shift-alignments for telling data preparation to shift alignments one target token forward, and translation one target token backward.

Report on performance impact: Sockeye_Alignment_Matrix_Report-6.pdf

Pull Request Checklist

[x] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]' until you can check this box.
[x] Unit tests pass (pytest)
[x] Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
[x] System tests pass (pytest test/system)
[ ] Passed code style checking (./style-check.sh)
[x] You have considered writing a test
[x] Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
[x] Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

iPRET commented 6 months ago

So I ran the style_check.sh (My appologies that I didn't do it before the pull request) (ᗒᗣᗕ)՞. . One of the issues is a type error. I modified TransformerDecoder.decode_seq to return Tuple[Tensor, Tensor] istead of the previous Tensor. This is necessary because the decoder now has to return both the translation and the attention matrices. But it doesn't fit what the abstract class Decoder.decode_seq expects of the function. I see a couple paths forward here:

Make decode_seq return a dictionary of results where keys correspond to different tensors.
Make Decoder.decode_seq return Tuple[Tensor, Tensor]
Make a new function called decode_seq_and_attention, and modify the logic everywhere that calls decode_seq.
Perhaps something completely different?

Which approach do you think might be most pleasing? Thanks, IP

mjdenkowski commented 6 months ago

Hi Ingus,

It looks like you've created a thorough implementation and tested it extensively. Given the size of the pull request and our current priorities, it may be some time before we get to this. We'll follow up when we start to review the code and your report.

Best, Michael

iPRET commented 6 months ago

Hello Michael.

Before You do the code review, can you please give a quick opinion on the type error, so I can fix it up? (It's the last problem holding me from ticking the style_check.sh box) ༼ つ ◕.◕ ༽つ

Thanks, IP

mjdenkowski commented 5 months ago

Hi Ingus,

Thank you for your interest in using and contributing to Sockeye!

Unfortunately, we were not able to include your new feature before the cutoff for transitioning Sockeye to maintenance mode. We appreciate the amount of work that went into developing and testing this feature and encourage you to share your extended version of Sockeye with others.

Best wishes on your future projects!

--Michael

awslabs / sockeye

Add alignment matrix learning #1108

Pull Request Checklist