Closed or-toledano closed 4 years ago
Hi @or-toledano,
Thanks for the PR!
I see what you're saying here, but this doesn't seem like an error to me -- the delta here is meant to be read not as a scalar value or step size but rather as a "change of" operator. From Wikipedia:
In other words, $\delta W$ is to be read as "the amount by which the $W$ matrix changes." Going to close this because IMO leaving it this way is cleaner than introducing $\delta$ as a step size and $W'$ as some randomly sampled noise, but if you'd be willing to contribute a note clarifying the existing notation that'd for sure be appreciated.
Thanks. I guess I was more familiar with \delta as a scalar and \Delta as "change of", but now I will pay more attention to context.
The first W is random, and the second W is random and independent from the first W. That's why I purpose denoting the second W with W', to distinguish it from the first W. You can also see that the code is correct and computes np.random.randn(10, 3073) * step_size independently from the first W.