cs231n / cs231n.github.io

Public facing notes page
MIT License
10.17k stars 4.06k forks source link

Update optimization-1.md #232

Closed or-toledano closed 4 years ago

or-toledano commented 4 years ago

The first W is random, and the second W is random and independent from the first W. That's why I purpose denoting the second W with W', to distinguish it from the first W. You can also see that the code is correct and computes np.random.randn(10, 3073) * step_size independently from the first W.

brentyi commented 4 years ago

Hi @or-toledano,

Thanks for the PR!

I see what you're saying here, but this doesn't seem like an error to me -- the delta here is meant to be read not as a scalar value or step size but rather as a "change of" operator. From Wikipedia: image

In other words, $\delta W$ is to be read as "the amount by which the $W$ matrix changes." Going to close this because IMO leaving it this way is cleaner than introducing $\delta$ as a step size and $W'$ as some randomly sampled noise, but if you'd be willing to contribute a note clarifying the existing notation that'd for sure be appreciated.

or-toledano commented 4 years ago

Thanks. I guess I was more familiar with \delta as a scalar and \Delta as "change of", but now I will pay more attention to context.