JuliaML / MLDataUtils.jl

Utility package for generating, loading, splitting, and processing Machine Learning datasets
http://mldatautilsjl.readthedocs.io/
Other
102 stars 20 forks source link

Data Access Pattern in 0.5 #16

Closed Evizero closed 7 years ago

Evizero commented 7 years ago

Implementation or continuation of #13 #14 #15

The code and tests are now in place and ready for review for anyone interested.

I will leave this open at least until I wrote a basic readthedocs documentation, at which point I will merge unless some things come up.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-2.8%) to 94.724% when pulling b2a04f9955ab6ba3a54412f0c5bad181a04d90ea on dev into d5503fccb665b3ad37b588536896d203695ae3c8 on master.

Evizero commented 7 years ago

@oxinabox I updated oversample and undersample to reflect my plan for labeled data. See this commit: https://github.com/JuliaML/MLDataUtils.jl/pull/16/commits/1461624da1c70cd4e9b875e3d9722719916c0d08

This introduces a new function target([f], data) which returns the, well, target(s) of some given data object.

My changes are not set in stone. Discussion very welcome

oxinabox commented 7 years ago

Woot!

Evizero commented 7 years ago

Don't get too excited! The code is functional but the documentation still unfinished. I am preparing to split the package apart as discussed in #29

oxinabox commented 7 years ago

It is merged. The docs are pretty, if not 100% complete, and it has a very intuitive design. Please tag a v0.1.0 at least here. So I can list it as a dependency. Even if you break it down in #29

Evizero commented 7 years ago

yes ok, fair enough. Doesn't hurt to have it functional here first before splitting apart.

I'll tag it before the end of this week. Might as well finish rudimentary documentation for all new aspects here.

Evizero commented 7 years ago

I just tagged. May take a bit to get into METADATA, as the diff is a bit much to look through.

In general you may be better off to directly depend on MLDataPattern, unless you require DataFrames