IBM / Train-Custom-Speech-Model

Create a custom Watson Speech to Text model using specialized domain data
https://developer.ibm.com/patterns/customize-and-continuously-train-your-own-watson-speech-service/
Apache License 2.0
59 stars 42 forks source link

Audio preprocessing #76

Open predestination opened 5 years ago

predestination commented 5 years ago

Hey, what are the possible Audio Pre-processing steps that can be used to improve transcript quality? Is there any library in python for denoising or audio enhancement without using deep learning ( as it is taking lot of time for a small audio clip). ?

tonanhngo commented 5 years ago

Hi, if you expect most of your input is noisy or is unique in certain ways (like speaker accent, background noise), then it's better to train the custom acoustic model with this type of audio. The IBM Debater uses this approach and was able to reduce the error rate to ~5%. If you have a few audio clips and want to do noise reduction, I did a quick search and saw a few options:

predestination commented 5 years ago

Thank you for the reply, I tried noisereduce and logmmse earllier but it didn't improve the transcript quality. Will check the scipy signal.