Experiments on WMT 2018 shared task setup

phikoehn commented 5 years ago

I trained a few models for en-de and tested it on the WMT 2018 de-en shared task setup.

BLEU-c/SMT . . . . . . . . 100m _10m __1m PROMPT-LM submission . . . 31.1 25.4 JHU Zipporah submission. . 30.2 26.3 provided model . . . . . . 29.6 23.3 18.8 nc . . . . . . . . . . . . 27.1 26.3 22.5 wmt. . . . . . . . . . . . 28.0 26.7 22.3 wmt-cc . . . . . . . . . . 30.7 26.3 21.3 wmt bad-paracrawl. . . . . 30.0 27.5 21.2 wmt-cc bad-paracrawl . . . 30.6 27.4 21.2

The "submissions" are official WMT 2018 numbers
nc / wmt / wmt-cc are different sets of clean parallel data trained on (wmt is nc+europarl+rapid)
bad-paracrawl uses 2000 sentence pairs from raw paracrawl as "noisy" examples

I will also run NMT models but these may take a while.

I generally get good numbers but not with the provided model.

Any advice on how to train this differently, please let me know.

mbanon commented 5 years ago

Probably @vitaka is the most suitable person to answer this.

vitaka commented 5 years ago

Hi Philipp,

Thanks for reporting this.

WMT 2018 exact results are difficult to reproduce with the current version of Bicleaner because of two main reasons:

In the submission, we used an additional n-gram saturation filter (see Section 4.3 of https://aclweb.org/anthology/W18-6488) which discarded sentences too similar to those already included in the filtered corpus. This was a shared-task-specific optimization because the amount of text was fixed and it is not included in Bicleaner.
In the submission, we used a "placeholder" LM instead of a character-based one. They performed similarly in a set of experiments I performed, and we decided to use the character-based one as it was more principled and generalizable (no need to design a new placeholder scheme for new languages).

Concerning the differences with the provided model (which I guess is the latest one released), the only difference that comes to my mind is that the probabilistic dictionaries were extracted directly from Opus. Here Prompsit's people can provide more detailed information since I did not (completely) take part in the last release.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it has not had recent activity. Thank you for your contributions.

bitextor / bicleaner

Experiments on WMT 2018 shared task setup #18