PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Apache License 2.0
6.9k stars 2.91k forks source link

Transformer Pre-trained weight #1456

Open hshen14 opened 5 years ago

hshen14 commented 5 years ago

I would like to run the reproduce the bleu for transformer model. May I know whether there is pre-trained weight for sharing? Thanks. @Superjomn @panyx0718 @luotao1

https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md

panyx0718 commented 5 years ago

@guoshengCS @kuke

guoshengCS commented 5 years ago

We upload the pre-trained weight and data used in Transformer base model just now. Please download from http://transformer-model-data.bj.bcebos.com/wmt16_ende_data_bpe_clean.tar.gz for data and http://transformer-model-data.bj.bcebos.com/iter_100000.infer.model.tar.gz for pre-trained weight.

hshen14 commented 5 years ago

Thanks @guoshengCS. I tried the inference on CPU and the speed seems very slow. I have some questions:

  1. Is there any instructions to measure the performance? Do you have the total inference time on CPU?
  2. Some sentences looks abnormal during inference, with @ and 乱码. Is it okay?
  3. Would you please show the corresponding BLEU with the model you provided?
  4. Would you please share gen_data/mosesdecoder/scripts/generic/multi-bleu.perl?

Er konnte nicht wieder@@ beleb@@ t werden , und er starb am nächsten Morgen . Den@@ g besch@@ wert sich schlie乱码lich (schließlich) , dass sein Kopf verletzt dann un@@ bewusst .

guoshengCS commented 5 years ago
  1. I haven't tried training and inference on CPU. Sorry for that I can't give a benchmark for the performance on CPU, and I haven't heard any released Transformer performance on CPU.
  2. @ is introduce by BPE(byte-pair encoding, which is used in the paper) and can be removed by ed -r 's/(@@ )|(@@ ?$)//g' . The examples seem same as my result, and it might work correctly.
    [paddle@yq01-gpu-v110-255-100-01 transformer_1.1]$ grep 'Er konnte nicht wieder\|Den@@ g besch@@ wert sich' ~/guosheng/transformer_test_1.1/models/fluid/neural_machine_translation/transformer/results_2016/predict_iter100000.txt
    Den@@ g besch@@ wert sich schließlich , dass sein Kopf verletzt dann un@@ bewusst .
    Er konnte nicht wieder@@ beleb@@ t werden , und er starb am nächsten Morgen .
  3. Testing with multi-bleu.perl and the pre-trained weight, the BLEU score is 33.64
  4. You can get multi-bleu.perl according to this https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/gen_data.sh#L110
hshen14 commented 5 years ago

Thanks @guoshengCS. Reproduced.