Open amesval opened 2 years ago
Ah, I forgot to include the vocab files that are mentioned in the decoder.yml. Thanks for pointing me into that direction. The *.vocab.yml
file is not the correct one here as this model comes with separate vocabularies for source and target language. Look into the decoder.yml
file to see that this is the case. But the *.vocab
files mentioned there are missing. However, you can use the spm-files directly. Edit the decoder.yml file to look like this:
relative-paths: true
models:
- opusTCv20210807+bt.spm32k-spm32k.transformer-align.model1.npz.best-perplexity.npz
vocabs:
- source.spm
- target.spm
beam-size: 6
normalize: 1
word-penalty: 0
mini-batch: 1
maxi-batch: 1
maxi-batch-sort: src
And then run something like that:
echo "This is a test." | ./preprocess.sh eng source.spm | marian-decoder -c decoder.yml
Does that work for you?
Hi everyone. I would like to make some inferences and replicate the reported BLEU Score for the English to Korean Translation model (https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-kor). I downloaded the files there and installed marian-nmt on Ubuntu 20.04.3, including protobuf to use Sentencepiece as required in https://marian-nmt.github.io/docs/ . I ran the preprocess.sh, then with its output ran marian-decoder to get the translations, and finally ran the postprocess.sh. The results were unexpected, in fact there where no Korean characters at all.
Am I doing something wrong?