Helsinki-NLP / Opus-MT

Open neural machine translation models and web services
MIT License
592 stars 71 forks source link

Bad translations using marian-decoder #29

Open koren-v opened 3 years ago

koren-v commented 3 years ago

Hi, I've loaded the models from the following directory: https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/ru-en When I tried some of them I often get translation like: "▁Y O O O O O O O O O O O O O O O O O O O O" or "I 'm b@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@" Then I tried to load the model from the Hugging Face site: but get pretty similar outputs while using Hugging Face framework gives good translations. Probably something wrong with config. I launch it using the Marian library. For example:

 echo "привет" | ./marian-decoder -c /path/to/opus_models/opus-2019-12-05-ru-en/decoder.yml

So what can be wrong?

Probably I somehow should do preprocessing and postprocessing?

jorgtied commented 3 years ago

You need to preprocess the string first using the provided sentence piece model of the source language. Our models don;t support internal sentence piece segmentation. This needs to be done before piping input to the decoder.