Closed phikoehn closed 5 years ago
Probably @vitaka is the most suitable person to answer this.
Hi Philipp,
Thanks for reporting this.
WMT 2018 exact results are difficult to reproduce with the current version of Bicleaner because of two main reasons:
Concerning the differences with the provided model (which I guess is the latest one released), the only difference that comes to my mind is that the probabilistic dictionaries were extracted directly from Opus. Here Prompsit's people can provide more detailed information since I did not (completely) take part in the last release.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Thank you for your contributions.
I trained a few models for en-de and tested it on the WMT 2018 de-en shared task setup.
BLEU-c/SMT . . . . . . . . 100m _10m __1m
PROMPT-LM submission . . . 31.1 25.4
JHU Zipporah submission. . 30.2 26.3
provided model . . . . . . 29.6 23.3 18.8
nc . . . . . . . . . . . . 27.1 26.3 22.5
wmt. . . . . . . . . . . . 28.0 26.7 22.3
wmt-cc . . . . . . . . . . 30.7 26.3 21.3
wmt bad-paracrawl. . . . . 30.0 27.5 21.2
wmt-cc bad-paracrawl . . . 30.6 27.4 21.2
I will also run NMT models but these may take a while.
I generally get good numbers but not with the provided model.
Any advice on how to train this differently, please let me know.