Prediction errors using pre-trained eng-kor model

Helsinki-NLP / Tatoeba-Challenge

Other

801 stars 91 forks source link

Ah, I forgot to include the vocab files that are mentioned in the decoder.yml. Thanks for pointing me into that direction. The *.vocab.yml file is not the correct one here as this model comes with separate vocabularies for source and target language. Look into the decoder.yml file to see that this is the case. But the *.vocab files mentioned there are missing. However, you can use the spm-files directly. Edit the decoder.yml file to look like this:

relative-paths: true
models:
  - opusTCv20210807+bt.spm32k-spm32k.transformer-align.model1.npz.best-perplexity.npz
vocabs:
  - source.spm
  - target.spm
beam-size: 6
normalize: 1
word-penalty: 0
mini-batch: 1
maxi-batch: 1
maxi-batch-sort: src

And then run something like that:

echo "This is a test." | ./preprocess.sh eng source.spm | marian-decoder -c decoder.yml

Does that work for you?

Helsinki-NLP / Tatoeba-Challenge

Prediction errors using pre-trained eng-kor model #21