Has sb. trained the transformer model on WMT14 en-de and test on newstest2014?

THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

BSD 3-Clause "New" or "Revised" License

701 stars 197 forks source link

Has sb. trained the transformer model on WMT14 en-de and test on newstest2014? #71

Closed minorfox closed 5 years ago

minorfox commented 5 years ago

Now the model runs 55000 steps, the score is also 0.12xxx. Is it correct?

batch_size=6250 update_cycle=4 (All defalut)

Glaceon31 commented 5 years ago

It seems not correct. The BLEU score should be more than 20 there. Please check your data and settings.

BTW, we recommend to use this parameter set for WMT14 en-de task: shared_embedding_and_softmax_weights=true,layer_preprocess=layer_norm,layer_postprocess=none,attention_dropout=0.1,relu_dropout=0.1,adam_beta2=0.98

This setting yield a 27.03 BLEU score after 200000 training steps (average the last 5 checkpoints)

minorfox commented 5 years ago

@Glaceon31 Last night, I changed the corpus according to the Manual and all sets are the default. (wmt17de-en/train, newstest14/test) Here is the log: 2019-08-20 02:50:11.065016: BLEU at step 5000: 0.158979 2019-08-20 05:24:28.222200: BLEU at step 10000: 0.189248 2019-08-20 07:58:33.829838: BLEU at step 15000: 0.195997 2019-08-20 10:32:39.163866: BLEU at step 20000: 0.197948 2019-08-20 13:06:27.287002: BLEU at step 25000: 0.202929 2019-08-20 15:37:04.897173: BLEU at step 30000: 0.207372 2019-08-20 18:07:53.231772: BLEU at step 35000: 0.209581 2019-08-20 20:38:38.589960: BLEU at step 40000: 0.207665 2019-08-20 23:09:15.135549: BLEU at step 45000: 0.208846

It seems, emmm....., also incorrect.

Maybe also the problem of python3.x(replace "@@ " or others... ), I think I should print the translator results. I'll do it tomorrow. Thank you very much!!

minorfox commented 5 years ago

@Glaceon31 In the hook.py, line 172-173, you replaced the "@@" of the model output, why don't you do that for reference?

I added the replace op to reference, then I got a score "0.32" corresponding the score "0.21 by unreplaced reference. This score(0.32) also equals the score computed by translator.py outputs after sed 's blablabla op.

So, is it possible that you forgot to add the replace op to reference?

GrittyChen commented 5 years ago

@minorfox We do not add replace operation to reference on the premise that the reference does not do the BPE operation. We don't think it makes sense to do BPE in reference. Thanks very much!

minorfox commented 5 years ago

@Glaceon31 But the code used the decoded_refs to compute bleu score. The decoded_refs comes from the feats["references"] which is the BPEd corpus. I also print the decoded_refs ：

[['the', 'formerly', 'super', 'secre@@', 'tive', 'N@@', 'SA', ',', 'once', 'nick@@', 'named', 'No', 'Su@@', 'ch', 'Agency', ',', 'has', 'found', 'itself', 'in', 'very', 'public', 'light', ',', 'and', 'am@@', 'id', 'vi@@', 'cious', 'criticism', ',', 'in', 'past', 'months', 'following', 'a', 'stream', 'of', 're@@', 'vel@@', 'ations', 'about', 'is', 'vast', 'foreign', 'and', 'domestic', 'surveillance', 'programs', '-', 'collectively', 'the', 'product', 'of', 'secret', 'N@@', 'SA', 'files', 'stol@@', 'en', 'from', 'the', 'agency', 'and', 'le@@', 'aked', 'by', 'di@@', 'sen@@', 'chan@@', 'ted', 'former', 'N@@', 'SA', 'contrac@@', 'tor', 'Ed@@', 'ward', 'Snow@@', 'den', '.']]

It is BPEd.

GrittyChen commented 5 years ago

@minorfox You do not need to BPE the reference file when validation.

minorfox commented 5 years ago

@Glaceon31 I know you mean. When I used the translator.py, I used the unBPE reference to get the bleu score. But in this code, hook.py, it used the BPEd reference. That is why I got an incorrect log/score before.

Glaceon31 commented 5 years ago

You should use unBPEd reference file for validation because our code will automatically unBPE the hypothesis. You can refer to the newest version of user manual.