arrigonialberto86 / deepar

Tensorflow implementation of Amazon DeepAR
MIT License
307 stars 97 forks source link

Hello, I have some questions about the loss function of the Gaussian distribution #6

Open SPOREIII opened 4 years ago

SPOREIII commented 4 years ago

The loss function of the Gaussian distribution given in the code is as follows:

tf.reduce_mean(0.5*tf.math.log(sigma) + 0.5*tf.math.truediv(tf.math.square(y_true - y_pred), sigma)) + 1e-6 + 6  

I think it can be expressed as the following mathematical formula: \frac{{\rm{1}}}{N} \times \sum {(\frac{1}{2} \times \log (\sigma ) + \frac{1}{2} \times \frac{{{{(z - \mu )}^2}}}{\sigma })}  + 1 \times {10^{ - 6}} + 6 But according to the formula in the original text(DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks), I derived the following result:  - 1 \times \frac{{\rm{1}}}{N} \times \sum {(\log (\frac{1}{{\sqrt {2\pi } \sigma }}{e^{ - \frac{{{{(z - \mu )}^2}}}{{2{\sigma ^2}}}}}))}  = \frac{{\rm{1}}}{N} \times \sum {(\log (\sqrt {2\pi } \sigma ) + \frac{{{{(z - \mu )}^2}}}{{2{\sigma ^2}}})}  = \frac{{\rm{1}}}{N} \times \sum {(\log (\sqrt {2\pi } \sigma ) + \frac{{{{(z - \mu )}^2}}}{{2{\sigma ^2}}})}  = \frac{{\rm{1}}}{N} \times \sum {(\log (\sqrt {2\pi } ) + \log (\sigma ) + \frac{{{{(z - \mu )}^2}}}{{2{\sigma ^2}}})} I don't know where I got it wrong and I cannot get the loss function form given in the code.

benman1 commented 3 years ago

@SPOREIII I've changed this to your formula. Please check.

arrigonialberto86 commented 3 years ago

Thanks for the question. This is a very old repo, but @benman1 did a great job cleaning up the mess recently. Regarding the loss function: the one previously used was wrong of course, and needs to be corrected according to what you are reporting above. As further discussion topic, we may consider using the new probabilistic layers introduced with Tensorflow Probability here , where they use a simple lambda function to directly reference the negative log likelihood of a Gaussian distribution: negloglik = lambda y, rv_y: -rv_y.log_prob(y). In this case we would not even need to express the likelihood explicitly