Closed alogblog closed 5 years ago
Hi @alogblog , Thanks for your question. The original is correct.
axis
should be 0, not 1.To understand why feature standardization is important when using gradient descent, please read chapter 4 on gradient descent, there is a section that explains why features should be scaled.
Moreover, at test time we might get a single instance at a time: it would not make sense to standardize a single feature using its own mean and std deviation. So instead, we will use the training set's mean and std deviation at test time to scale new instances. Therefore, during training we should do the same: use the full training set's mean and std deviation (per feature).
Hope this helps, Aurélien
Closing stale issues. Please reopen if you are still experiencing a problem.
Original
New
Am I wrong? If mine is not wrong, is there any reason you did that, which I couldn't catch ?
Thanks.