Expression 5.10 which needs an upper index m on $x_i$
A couple of F(x, \Theta) for F(xˆm, \Theta) (are somewhat optional)
Adds an equation that explains simply implies computing the average of the gradients for the individual examples.
Some comments
We just said
* To simplify notation, and without loss of generality, we will work with the classification cost of an individual example *
Then we say that we compute the gradient of the loss (over all the dataset) with respect to W.
We should say gradient of the loss (over the mˆth training point).
Even though expression 5.9 is completely correct without the upper case mˆth I think we should include it (because it is also correct and
for consistency reasons).
This commit corrects
Some comments
We just said * To simplify notation, and without loss of generality, we will work with the classification cost of an individual example * Then we say that we compute the gradient of the loss (over all the dataset) with respect to W. We should say gradient of the loss (over the mˆth training point).
Even though expression 5.9 is completely correct without the upper case mˆth I think we should include it (because it is also correct and for consistency reasons).