Closed DOUS3L closed 3 years ago
@DOUS3L I'm using this code for a class exercise and came across your comment. Would you mind clarifying where you see the code in the book dividing by sigmoid_prime(sz[-1])
?
The code in chapter 2 does that. However, network2.py comes from this section in chapter 3.
The original Python 2 code from the author is also available here.
The code looks like this:
# backward pass
delta = (self.cost).delta(zs[-1], activations[-1], y)
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
This was done because in network.py the loss function is quadratic, while in network2.py it's the cross-entropy (log) function. There is an explanation in chapter 3 about choosing the learning rate value that points that out:
As we saw earlier, the gradient terms for the quadratic cost have an extra σ′=σ(1−σ) term in them.
The cost can be defined in the main test.py
script for running the network learning algorithm.
Hence, this is an basic implementation which should not assume to use the specific loss funciton.
https://github.com/MichalDanielDobrzanski/DeepLearningPython35/blob/ea229ac6234b7f3373f351f0b18616ca47edb8a1/network2.py#L253
here it should be delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1]) the code from the book does that