Reproducing speech-to-text translation result on MuST-C En-De and how to get the model output

bzhangGo commented 2 years ago

Hi all,

I'm trying to reproduce the speech-to-text translation result on MuST-C en-de (22.8 SacreBLEU) as reported in the paper.

I used the given st_specaug checkpoint, and get the following results (commands are given below):

python3 -m neurst.cli.run_exp \
    --config_paths must-c/asr_st/de/st_prediction_args.yml \
    --model_dir st_specaug

Results:

I1125 12:34:33.630551 140283317171008 sequence_generator.py:140] Generation elapsed: 267.03s
I1125 12:34:33.636475 140283317171008 dataset_utils.py:307] Loading TF Records from:
I1125 12:34:33.641129 140283317171008 dataset_utils.py:311]    b'must-c//devtest/dev.en-de.tfrecords-00000-of-00001'
I1125 12:34:38.733426 140283317171008 sequence_generator.py:166] Evaluation Result (dev):
I1125 12:34:38.733613 140283317171008 sequence_generator.py:170]    sacre_bleu=21.59
I1125 12:34:38.733672 140283317171008 sequence_generator.py:170]    tok_bleu=21.60
I1125 12:34:38.733724 140283317171008 sequence_generator.py:170]    detok_bleu=21.59
I1125 12:34:38.733772 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.40
I1125 12:34:38.733818 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.21
I1125 12:34:38.733864 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.40
I1125 12:34:38.735978 140283317171008 dataset_utils.py:307] Loading TF Records from:
I1125 12:34:38.739267 140283317171008 dataset_utils.py:311]    b'must-c//devtest/tst-COMMON.en-de.tfrecords-00000-of-00001'
I1125 12:34:46.309522 140283317171008 sequence_generator.py:166] Evaluation Result (tst-COM):
I1125 12:34:46.309664 140283317171008 sequence_generator.py:170]    sacre_bleu=22.34
I1125 12:34:46.309723 140283317171008 sequence_generator.py:170]    tok_bleu=22.32
I1125 12:34:46.309772 140283317171008 sequence_generator.py:170]    detok_bleu=22.34
I1125 12:34:46.309818 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=23.04
I1125 12:34:46.309862 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.93
I1125 12:34:46.309904 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=23.04
I1125 12:34:46.310050 140283317171008 sequence_generator.py:166] Evaluation Result (on average by weights {'dev': 0.5, 'tst-COM': 0.5}):
I1125 12:34:46.310102 140283317171008 sequence_generator.py:170]    sacre_bleu=21.96
I1125 12:34:46.310146 140283317171008 sequence_generator.py:170]    tok_bleu=21.96
I1125 12:34:46.310188 140283317171008 sequence_generator.py:170]    detok_bleu=21.96
I1125 12:34:46.310238 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.72
I1125 12:34:46.310281 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.57
I1125 12:34:46.310323 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.72
I1125 12:34:55.999752 140283317171008 sequence_generator.py:166] Evaluation Result (mixed of dev,tst-COM):
I1125 12:34:56.000025 140283317171008 sequence_generator.py:170]    sacre_bleu=22.06
I1125 12:34:56.000084 140283317171008 sequence_generator.py:170]    tok_bleu=22.06
I1125 12:34:56.000133 140283317171008 sequence_generator.py:170]    detok_bleu=22.06
I1125 12:34:56.000186 140283317171008 sequence_generator.py:170]    uncased_sacre_bleu=22.80
I1125 12:34:56.000232 140283317171008 sequence_generator.py:170]    uncased_tok_bleu=22.69
I1125 12:34:56.000276 140283317171008 sequence_generator.py:170]    uncased_detok_bleu=22.80

I find that 22.8 is for mixed results rather than tst-COM. Is there anything I missed during the evaluation? Or, is there any misunderstanding on the evaluation results? Besides, I didn't get the translation output with the above command. Could you please show more details about how to get the model's translation output so as to make translation analysis possible?

thanks, Biao

zhaocq-nlp commented 2 years ago

@bzhangGo I checked the checkpoint and found it was not the exact one. Now you can fetch the true model via http://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/neurst/speech_to_text/mustc/de/st_specaug2.tgz.

For how to get the translation output, you can add following configuration into the st_prediction_args.yml:

output_file:
  dev: ./dev.hypo.txt
  tst-COM: ./tst.hypo.txt

bzhangGo commented 2 years ago

Great! This is very helpful!

bytedance / neurst

Reproducing speech-to-text translation result on MuST-C En-De and how to get the model output #40