Open wanghm92 opened 6 years ago
There are no plans to add these features but contributions are welcome.
It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq
APIs. We might want to revise that at some point.
@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.
My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf,but so far OpenNMT-py seems to work well for my purposes.
@guillaumekln @kaihuchen Thanks a lot for the replies! I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py : https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142 Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works? The same question was asked on the forum but has no response yet. http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651 Could you give some hints? Thanks!
@wanghm92 FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:
@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible. Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.
For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.
@guillaumekln
I see this discussion happened three years ago. Are there any plans to work on these features at the moment? Thank you!
There is no plan to work on this at the moment, but I would accept a PR adding these features.
May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?
Or, any hints on a quick implementation? Thanks!