Add Model Evaluation Process

jm-glowienke commented 3 years ago

A first model was trained, but can not be evaluated on the test set(s) yet.

What needs to be done?

SPARQL decoder

[x] Check whether SPARQL decoder is working
[x] Adapt SPARQL decoder if necessary
[x] Find workaround to put names in quotation marks and case-sensitive --> maybe change database

Test sets

[x] Implement script to evaluate fairseq model on test sets
[x] Also implement possibility to judge translation quality by hand (interactive script)
[ ] ~~Generate several test sets and evaluate them~~ -> move to new feature issue

Evaluation metrics

[x] Add metric to judge quality of translation using BLEU or accuracy
[x] Add a result based metric on database result of query (judging translation query indirectly)

jm-glowienke commented 3 years ago

fairseq-train --eval-bleu \ --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \ --eval-bleu-detok moses \ --eval-bleu-remove-bpe \ --eval-bleu-print-samples \ --best-checkpoint-metric bleu --maximize-best-checkpoint-metric

jm-glowienke commented 3 years ago

Quotation marks:

[x] Fix typo in object name
[x] Check why there are not equal number of left and right quot marks
[x] add replacement without spaces

EDIT 14-04: This has been fixed by stripping quotes, which come with the names.

DeNederlandscheBank / nqm

Add Model Evaluation Process #14