BytedanceSpeech / seed-tts-eval

1.05k stars 105 forks source link

There is NOT numerical normalization module in cal_wer.sh, which results in higher WER results. #12

Open LwLiu-2012 opened 3 months ago

LwLiu-2012 commented 3 months ago

When I run cal_wer.sh, I found there is NOT numerical normalization module, which results: Target text: "The primary coil has fifty turns."——> Inference text by Whisper-large: "The primary coil has 50 turns." "My grandmother has Type One diabetes."——>"My grandmother has type 1 diabetes."

Is the WER result in your SeedTTS paper also obtained in this way?

faceless-rex commented 3 months ago

When I run cal_wer.sh, I found there is NOT numerical normalization module, which results: Target text: "The primary coil has fifty turns."——> Inference text by Whisper-large: "The primary coil has 50 turns." "My grandmother has Type One diabetes."——>"My grandmother has type 1 diabetes."

Is the WER result in your SeedTTS paper also obtained in this way?

@LwLiu-2012 Yes, the results reported in our paper were obtained exactly with the released code.

LwLiu-2012 commented 3 months ago

Got it. Thanks for your response.