drethage / speech-denoising-wavenet

A neural network for end-to-end speech denoising
MIT License
673 stars 165 forks source link

How to predict not a single value but a distribution as in original Wavenet? #14

Closed kfmn closed 6 years ago

kfmn commented 6 years ago

I would like to predict a distribution but not a single value per sample, like original Wavenet does. What should I change in source code (I have seen there is some preparation for this in util.py where sound is converted from linear to ulaw and back...). Help please

drethage commented 6 years ago

You'll want to change the last layer(s) to produce a vector per output sample instead of just a scalar and train with a softmax loss.

However note: initial experiments showed that for the speech denoising task, modeling the output distribution and training with softmax loss resulted in distributions with high variance suggesting low confidence in predictions.

kfmn commented 6 years ago

Yes, I understand that I should replace last layer(s) with softmax and probably replace (or maybe complement) l1-l2 loss with cross-entropy loss. I intend to consider the expectation of this distribution as a scalar prediction so I can keep l1-l2 loss. And I need the variance of this distribution to assess the uncertainty of denoising result... So, indeed I don't need the distribution itself but instead only its first and second moments.

But unfortunately I am not very familiar to Keras so I am in doubt where this softmax layer should be inserted exactly. Could you advice?

drethage commented 6 years ago

The softmax layer is always the last layer in the network. Instead of producing a [# time samples, 1] shape output, you'll predict a [# time samples, # quantization levels] shape output and then feed that through a softmax transformation, followed by cross-entropy loss.