OpenNMT / OpenNMT-tf

Neural machine translation and sequence learning using TensorFlow
https://opennmt.net/
MIT License
1.45k stars 390 forks source link

Coverage Mechanism and Coverage Loss #180

Open wanghm92 opened 6 years ago

wanghm92 commented 6 years ago

May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?

Or, any hints on a quick implementation? Thanks!

guillaumekln commented 6 years ago

There are no plans to add these features but contributions are welcome.

It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq APIs. We might want to revise that at some point.

kaihuchen commented 6 years ago

@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.

My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf,but so far OpenNMT-py seems to work well for my purposes.

wanghm92 commented 6 years ago

@guillaumekln @kaihuchen Thanks a lot for the replies! I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py : https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142 Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works? The same question was asked on the forum but has no response yet. http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651 Could you give some hints? Thanks!

kaihuchen commented 6 years ago

@wanghm92 FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:

wanghm92 commented 6 years ago

@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible. Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.

guillaumekln commented 6 years ago

For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.

tmkhalil commented 3 years ago

@guillaumekln

I see this discussion happened three years ago. Are there any plans to work on these features at the moment? Thank you!

guillaumekln commented 3 years ago

There is no plan to work on this at the moment, but I would accept a PR adding these features.