Redundant term in backprop derivation ?

chrismbryant / backpropagation

My derivation of the backpropagation algorithm used in deep neural networks.

23 stars 9 forks source link

Redundant term in backprop derivation ? #1

Closed yash-nisar closed 2 years ago

yash-nisar commented 6 years ago

In 2017_06_21 Backpropagation Derivation (for Coursera ML).pdf, on page 30, the last term of the 3rd step: δ_jk seems to be redundant if I am not wrong ?

chrismbryant commented 6 years ago

Thanks for checking my work! I think that you are right that this kronecker delta term is redundant (since it is implied by the partial derivative). I still included it, though, because I wanted to make it extra apparent that the partial derivative step is what removes the summation, zeroing out all the terms where the index k is not equal to j. Does that make sense?

yash-nisar commented 6 years ago

Hey @chrismbryant thanks for replying. Yes, makes complete sense. I'm planning to write a blog that will include a detailed derivation of the backpropagation algorithm and your work (which I found as one of the best explanations) will be cited. It would be great if you could share your latex code since that would ease my work. :D

chrismbryant commented 6 years ago

Oh cool, thanks! Could you give me a link to your blog post when you're done? I'd like to see how you explain the derivation. Unfortunately, I do not have the Latex code because I actually wrote up the entire document in Microsoft Word (the equation editor does have code similar to Latex, though). Also, I should mention that I based my second derivation (2017_09_27--the one I think is better) on the second chapter from Michael Nielsen's book: http://neuralnetworksanddeeplearning.com/chap2.html. If you haven't given that one a look yet, consider checking it out.

yash-nisar commented 6 years ago

Sure, thanks ! I'll give the link to my blog post once I'm done.