NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

WER is very high for telephone audios #365

Open sunnyly2016 opened 5 years ago

sunnyly2016 commented 5 years ago

WER is very high for phone recordings. Could you please help us to improve the accuracy of S2T.

borisgin commented 5 years ago

What dataset do you use?

sunnyly2016 commented 5 years ago

Hi Boris,

I use phone call recording from a New Zealand accent and the recordings are with 8000HZ. In order to improve the performance, I used the original language model mentioned in the repo, it increases the accuracy a little bit. but takes ages to process a single sample.

Cheers, Sunny

On Wed, Mar 6, 2019 at 7:50 AM Boris Ginsburg notifications@github.com wrote:

What dataset do you use?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/OpenSeq2Seq/issues/365#issuecomment-469811613, or mute the thread https://github.com/notifications/unsubscribe-auth/AZx9AKd4_FlQw-Ph5rP5-6vJ-E0llKxnks5vTryAgaJpZM4bbYpm .

GabrielLin commented 5 years ago

What is the value of your WER? Do you train your own model with your data?

sunnyly2016 commented 5 years ago

Hi Cabriel,

My WER is around 40% which is strangely high. And when it comes to the end of the transcript, there is not space between words, and the inference time is as long as more than half an hour.

Thank a lot

On Mon, Mar 18, 2019 at 9:40 PM GabrielLin notifications@github.com wrote:

What is the value of your WER? Do you train your own model with your data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/OpenSeq2Seq/issues/365#issuecomment-473817898, or mute the thread https://github.com/notifications/unsubscribe-auth/AZx9AOaQiaW9cjLP32QTJCVNJBSTSE_3ks5vX1EXgaJpZM4bbYpm .

aayushkubb commented 4 years ago

I don't think so you will get good results without fine-tuning your model.