Closed binshengliu closed 4 years ago
Hi, thanks for your question! Here are my answers.
Note that log (sigma)
can be negative during training, so I think it may cause different performance.
Good luck!
Thanks for the answers. I forgot to mention when I did the comparison, I fixed the negative log problem like your implementation so the only difference is using sigma^2
or sigma
. The final difference is not very large. I was just wondering if you had any consideration on the implementation.
Thanks for your reply, now I understand it in more detail. I think there is no essential difference between sigma
and sigma^2
. Both of them can be regarded as a learnable factor (just make sure that it does not become negative during training). The final effect of the two implementation methods may be related to the optimization of the deep learning framework, that is, which setting is easier to optimize.
I see. Thanks again for your thoughts!
Hi, thanks for sharing the code. It's really helpful to me. I have two questions.
In another implementation, the
sigma^2
is used as a parameter. I tried both learningsigma
andsigma^2
. They show close but different performance. Do you think the implementation difference may have some significant impact?I'm using hinge loss with uncertainty. For some batches, the loss value may be zero. In the case of loss being zero, the
params
have a chance to become zero or very small. Do you have suggestions on this?