I think there is a problem in the formulation to compute the derivative of W and b in this tutorial. Isn't the W of layer l comes from error in layer l and activation in layer l-1? But the formulation suggests W in layer l comes from error in layer l+1 and activation in layer l.
I think the right one should look like this
The same goes to b. Or maybe I just misunderstood this, if so, please point out, thanks!
I think there is a problem in the formulation to compute the derivative of W and b in this tutorial. Isn't the W of layer l comes from error in layer l and activation in layer l-1? But the formulation suggests W in layer l comes from error in layer l+1 and activation in layer l. I think the right one should look like this The same goes to b. Or maybe I just misunderstood this, if so, please point out, thanks!