OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.74k stars 2.25k forks source link

Interactive (prefix constrained) translation #568

Closed nimcho closed 6 years ago

nimcho commented 6 years ago

Hey guys,

Is there anybody out there working on prefix-constrained translation?

I'm looking for an interesting school project and particularly I'm interested in interactive translation. So if it's not yet being implemented in OpenNMT, I would gladly start working on this.

srush commented 6 years ago

I think that's a really cool idea. We would love to have it.

We currently have oracle translation (where you give the whole sentence). You could adapt that code to start with a prefix.

Even better. We have been talking about implementing general constraints. We were thinking you could maybe give in a regex or finite state automata, and constrain beam search to follow that format.

ykasimov commented 6 years ago

I am trying to do interactive translation. I am not sure how to test it properly. I need some user input which deviates from neural net prediction. Do you know how to simulate it?

nimcho commented 6 years ago

@srush Hmm... interesting. I'll try to make it as general as possible, but it'll be hard to cover the diversity of translation models (char-based, word-based, BPE-based, ...). I've played with plain text prefixes in Nematus, so I'll first try to migrate some models to OpenNMT.

@ykasimov Take a parallel corpus and use prefixes from it. If the model haven't seen the sentences during the training, the prefixes will likely deviate from its predictions.

srush commented 6 years ago

OpenNMT purposely ignores different types of models (char-based, word-based etc). It treats everything as a token and requires the user to handle this part of the setup. So if you wanted to have a prefix it would use whatever the token encoding is.

vince62s commented 6 years ago

@nimcho not sure if you implemented this or not, feel free to open a PR. Clsoing this issue for now, reopen if needed.