Numpy deep neural network

Thank you for this wonderful example, which helped me understanding the gradient descent implementation. I just noticed a minor mistake:

dW_curr = np.dot(dZ_curr, A_prev.T) / m
db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m

should be:

dW_curr = np.dot(dZ_curr, A_prev.T)
db_curr = np.sum(dZ_curr, axis=1, keepdims=True)

In addition:

params_values["W" + str(layer_idx)] -= learning_rate * grads_values["dW" + str(layer_idx)]
params_values["b" + str(layer_idx)] -= learning_rate * grads_values["db" + str(layer_idx)]

should also be:

params_values["W" + str(layer_idx)] -= learning_rate / m * grads_values["dW" + str(layer_idx)]
params_values["b" + str(layer_idx)] -= learning_rate / m * grads_values["db" + str(layer_idx)]

Otherwise, the code will not work, for instance if one wants to extend it to implement a regression use-case instead of a classification use-case (i.e. "none" instead of "softmax" in the final layer + court-circuiting the final activation function in the code).

SkalskiP / ILearnDeepLearning.py

Numpy deep neural network #29