Open supersodic opened 1 year ago
I.delta is used to compute the gradient of the loss that passes to the previous layers, not the loss itself. Please check this file that explains how to calculate the gradient of binary cross-entropy loss with logistic activation : https://www.ics.uci.edu/~pjsadows/notes.pdf Also, please note we flip the sign to minimize the loss.
Hello,
I have a question regarding the loss function in dc_layer.c. Why do you use l.delta[i]=(l.d_truth[i]-l.output[i])/size; instead of l.delta[i]=(l.d_truth[i]log(l.output[i]))+((1-l.d_truth[i])log(1-l.output[i]))? Is this an approximation, or what is the reasoning behind this chosen l.delta for the domain classifier?
Thank you in advance for your answer!