Way to measure model accuracy

raju249 commented 7 years ago

Hi team, Apart from perplexity score is there any other way for measuring model performance within openNMT. For example do we have a way to calculate BELU score of the model built using openNMT ?

guillaumekln commented 7 years ago

Hi,

You can just run the translation with a trained model and use the benchmark/3rdParty/multi-bleu.perl script to compute BLEU score.

raju249 commented 7 years ago

Do we have any link to documentation showing how to use the mentioned script ?

guillaumekln commented 7 years ago

It is the standard BLEU script so:

benchmark/3rdParty/multi-bleu.perl gold.txt < pred.txt

raju249 commented 7 years ago

perl benchmark/3rdParty/multi-bleu.perl refer.txt < pred.txt 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "hi.UTF-8",
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "hi_IN.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("hi_IN.UTF-8").
Use of uninitialized value in division (/) at benchmark/3rdParty/multi-bleu.perl line 129, <STDIN> line 1.
Use of uninitialized value in division (/) at benchmark/3rdParty/multi-bleu.perl line 129, <STDIN> line 1.
Use of uninitialized value in division (/) at benchmark/3rdParty/multi-bleu.perl line 129, <STDIN> line 1.
Use of uninitialized value in division (/) at benchmark/3rdParty/multi-bleu.perl line 129, <STDIN> line 1.
Use of uninitialized value $CORRECT[1] in multiplication (*) at benchmark/3rdParty/multi-bleu.perl line 134, <STDIN> line 1.
Use of uninitialized value $CORRECT[2] in multiplication (*) at benchmark/3rdParty/multi-bleu.perl line 134, <STDIN> line 1.
Use of uninitialized value $CORRECT[3] in multiplication (*) at benchmark/3rdParty/multi-bleu.perl line 134, <STDIN> line 1.
Use of uninitialized value $CORRECT[4] in multiplication (*) at benchmark/3rdParty/multi-bleu.perl line 134, <STDIN> line 1.
BLEU = 0.00, 0.0/0.0/0.0/0.0 (BP=1.000, ration=1.500)

@guillaumekln THis is what I get on running. Is this the expected ?

raju249 commented 7 years ago

I think gold.txt file is the reference file which is assumed to have correct transaltions and pred.txt is assumed to have translated text by model. Am I correct @guillaumekln ? Correct me if wrong. Thanks

guillaumekln commented 7 years ago

Maybe you'll need to use the original script: https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl

@jsenellart-systran Could you check the custom BLEU script?

raju249 commented 7 years ago

Ok, Anyways for now I am using the nltk implementation of the bleu score.

jsenellart commented 7 years ago

hello - normally the implementation is right - the "Use of uninitialized value in division" errors means that there is no 1-gram, no 2-gram, no 3-gram and no 4-gram matching which is weird. Can you share your output and reference file?

OpenNMT / OpenNMT

Way to measure model accuracy #290