Closed kfmn closed 6 years ago
You'll want to change the last layer(s) to produce a vector per output sample instead of just a scalar and train with a softmax loss.
However note: initial experiments showed that for the speech denoising task, modeling the output distribution and training with softmax loss resulted in distributions with high variance suggesting low confidence in predictions.
Yes, I understand that I should replace last layer(s) with softmax and probably replace (or maybe complement) l1-l2 loss with cross-entropy loss. I intend to consider the expectation of this distribution as a scalar prediction so I can keep l1-l2 loss. And I need the variance of this distribution to assess the uncertainty of denoising result... So, indeed I don't need the distribution itself but instead only its first and second moments.
But unfortunately I am not very familiar to Keras so I am in doubt where this softmax layer should be inserted exactly. Could you advice?
The softmax layer is always the last layer in the network. Instead of producing a [# time samples, 1] shape output, you'll predict a [# time samples, # quantization levels] shape output and then feed that through a softmax transformation, followed by cross-entropy loss.
I would like to predict a distribution but not a single value per sample, like original Wavenet does. What should I change in source code (I have seen there is some preparation for this in util.py where sound is converted from linear to ulaw and back...). Help please