JuliaML / MLDataUtils.jl

Utility package for generating, loading, splitting, and processing Machine Learning datasets
http://mldatautilsjl.readthedocs.io/
Other
102 stars 20 forks source link

rescale! and center! along obsdim #30

Closed abieler closed 7 years ago

abieler commented 7 years ago

It might be useful to also have the feature rescaling functions working on both dimensions. Currently the tests are failing, but I figured to check with you first if you like the general idea.

Evizero commented 7 years ago

Hi! Really cool that you are doing this. I agree that we should allow for the obsdim to be specified. In fact we already have a "system" for doing so which we use at MLDataPatterns.jl (that package will be the new back-end for MLDataUtils for all data subsetting, k-folds etc).

Would be cool if you could adapt the code to that "system". I describe the general way of doing it here: https://github.com/joshday/OnlineStats.jl/issues/40#issuecomment-290209593 . For most code we allow any order array, but it would already be a big improvement to just have code for vectors an matrices

edit: ObsDim is defined in LearnBase.jl here: https://github.com/JuliaML/LearnBase.jl/blob/master/src/LearnBase.jl#L318-L395

abieler commented 7 years ago

Cool. I ll definitely try to adapt to that scheme!

abieler commented 7 years ago

somewhat like this?

abieler commented 7 years ago

Thanks for the comments, I ll add some tests later on

Evizero commented 7 years ago

This is optional, but if you feel up to it, it would be cool to update the corresponding section in the documentation: https://raw.githubusercontent.com/JuliaML/MLDataUtils.jl/master/docs/data/feature.rst

Evizero commented 7 years ago

lgtm. Ready to merge when you are

Evizero commented 7 years ago

thanks!

abieler commented 7 years ago

thanks for all the comments! also learned about singleton types in the process. :) how do you feel about support for dataframes and datatables? worth looking at or do you want to keep this for arrays only?

Evizero commented 7 years ago

how do you feel about support for dataframes and datatables

I will add a DataFrames dependency in the next update (see dev branch), so I am open to the idea.