Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

Is it possible to train with just src, mt, ter? #46

Closed jimbogill closed 4 years ago

jimbogill commented 4 years ago

Hi, Thanks for making openkiwi available! Can any of the openkiwi models be trained when the only data available is [source sentence], [machine translation sentence], [TER score] ? As far as I can tell all the examples need more data, for example the tags. But maybe I missed something. Thanks, James

captainvera commented 4 years ago

Hi James!

Although this was something we never considered, we did have some requests about that and we added that functionality. This is the reason why all of the examples assume you would have more data. However, as I mentioned, you can train with just src, mt and TER.

To do so you need to specify the following in your config file (besides the rest of the parameters):

sentence-level: True
predict-gaps: False
predict-target: False
predict-source: False

Keep in mind that there are some config options based on word-level tags that might not work when training just for sentence-level.

Let us know if you find any errors while training only with sentences! Miguel

captainvera commented 4 years ago

Closing this since there have been no updates, feel free to re-open if you have further questions!

jimbogill commented 4 years ago

Thanks Miguel. I successfully trained a predictor-estimator model on WMT data following your advice. I did this using a modified version of the config file in the experiments directory. In case it's useful, here are the modifications I made:

https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L32 Changed to true

https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L46 Changed to false

https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L108 Changed to train-sentence-scores

https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L114 Changed to valid-sentence-scores