Open asbisen opened 7 years ago
Hi! All good feedback. The FeatureNormalizer
is quite old and a little outdated. I will rewrite it at some point. I think it would be a good idea to give consistent results with either Scikit-learn or Caret (R package). Neither of which I checked when I wrote this.
The row
vs column
thing has to do with Julia's array-memory order, but after a rewrite it will be possible to choose the observation dimension, similar to how LossFunctions
allows it
FeatureNormalizer
transforms the matrixX
using(X - μ)/σ
which translates toStandardScaler
in Scikit-Learn. WhereasNormalize
method in Scikit-Learn scales the data to a unit norm. I was wondering if we should renameFeatureNormalizer
toFeatureStandardizer
or something to that effect.Also is there a reason for having
FeatureNormalizer
to expect the Matrix such that the features are represented inrows
and notcolumns
And for the last issue I don't know which is correct way
Scikit
orMLDataUtils
. But there is a slight inconsistency between howStandardScaler
inScikit
calculates standard deviation vsMLDataUtils
. WithScikit
they usen
to scale the sum and we usen-1
to scale the sum while calculating standard deviation.Reference: Scikit-Learn Standardize Reference: Scikit-Learn Normalize