Closed jimbogill closed 4 years ago
Hi James!
Although this was something we never considered, we did have some requests about that and we added that functionality. This is the reason why all of the examples assume you would have more data. However, as I mentioned, you can train with just src, mt and TER.
To do so you need to specify the following in your config file (besides the rest of the parameters):
sentence-level: True
predict-gaps: False
predict-target: False
predict-source: False
Keep in mind that there are some config options based on word-level tags that might not work when training just for sentence-level.
Let us know if you find any errors while training only with sentences! Miguel
Closing this since there have been no updates, feel free to re-open if you have further questions!
Thanks Miguel. I successfully trained a predictor-estimator model on WMT data following your advice. I did this using a modified version of the config file in the experiments directory. In case it's useful, here are the modifications I made:
https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L32 Changed to true
https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L46 Changed to false
https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L108 Changed to train-sentence-scores
https://github.com/Unbabel/OpenKiwi/blob/715eba797ffbe461559002fc2927eb9cb416d7af/experiments/train_estimator.yaml#L114 Changed to valid-sentence-scores
Hi, Thanks for making openkiwi available! Can any of the openkiwi models be trained when the only data available is [source sentence], [machine translation sentence], [TER score] ? As far as I can tell all the examples need more data, for example the tags. But maybe I missed something. Thanks, James