NervanaSystems / deepspeech

DeepSpeech neon implementation
Apache License 2.0
222 stars 69 forks source link

Audio level normalization #52

Closed bigdatabaracus closed 6 years ago

bigdatabaracus commented 7 years ago

Hi, I’m using a custom dataset to train the Deep Speech 2 model in neon. The data set includes audio clips recorded at various audio levels. I’m wondering if normalizing the audio clips in one way or the other, prior to the audio feature extraction steps performed by the aeon, will have a positive impact to the performance of the model.

In the Deep Speech 2 paper it is reported that the inputs to the network are logspectrograms of power normalized audio clips. Does the aeon pipeline perform this power normalization also by default? If not can you usher me to the right direction on how to implement this kind of normalization?

Thanks a lot!

tyler-nervana commented 7 years ago

Aeon does peak normalization on each audio file prior to feature extraction, so you should be good to go.

If you want to do power normalization you can just normalize the standard deviation of each audio waveform to some common value. If you do this, though, you have to watch out for clipping of the audio when it gets written to disk. Aeon doesn't support this yet, so you'd have to do this beforehand.

bigdatabaracus commented 7 years ago

Excellent, thank you very much! Could you link me to the bit of code where the peak normalization is happening in aeon? Is the peak normalized peak value set to 0 dBFS or what is the value of this parameter and can it be tweaked?

Thanks a lot for the quick answer!

tyler-nervana commented 7 years ago

Good questions, and unfortunately I misspoke. Aeon does not do any peak normalization. I apologize for that. The most common thing I've seen would be to peak normalize to 0 dBFS. You should be able to do this easily using the sox command line utility (sox infile outfile gain −n seems to work for me)

bigdatabaracus commented 7 years ago

Hey, no worries! This is good to know. Thanks a lot for the clear answer on power normalization earlier!

tyler-nervana commented 6 years ago

Closing for now. Let us know if you have anymore questions.