Naming Convention for FeatureNormalizer

JuliaML / MLDataUtils.jl

Utility package for generating, loading, splitting, and processing Machine Learning datasets

Other

102 stars 20 forks source link

FeatureNormalizer transforms the matrix X using (X - μ)/σ which translates to StandardScaler in Scikit-Learn. Whereas Normalize method in Scikit-Learn scales the data to a unit norm. I was wondering if we should rename FeatureNormalizer to FeatureStandardizer or something to that effect.

Also is there a reason for having FeatureNormalizer to expect the Matrix such that the features are represented in rows and not columns

And for the last issue I don't know which is correct way Scikit or MLDataUtils. But there is a slight inconsistency between how StandardScaler in Scikit calculates standard deviation vs MLDataUtils. With Scikit they use n to scale the sum and we use n-1 to scale the sum while calculating standard deviation.

Reference: Scikit-Learn Standardize Reference: Scikit-Learn Normalize

JuliaML / MLDataUtils.jl

Naming Convention for FeatureNormalizer #27