Closed Evizero closed 7 years ago
@oxinabox I updated oversample
and undersample
to reflect my plan for labeled data. See this commit: https://github.com/JuliaML/MLDataUtils.jl/pull/16/commits/1461624da1c70cd4e9b875e3d9722719916c0d08
This introduces a new function target([f], data)
which returns the, well, target(s) of some given data object.
By default target(data)
is the identity function. So if you just pass it a single array, it assumes that this is the target array.
The optional f
is a function that specifies how the targets are extracted from data
. By default it is the identity function. This parameter is useful for custom types that include the target somehow (such as DataFrames, see docstring of commit).
if data
is a Tuple
then the convention is the the last element of the tuple contains the target, and thus the function target
is applied to data[end]
. This arbitrary convention is imposed by this package. We already treat tuples special as a "group-observations-together-type", so may as well go a step further.
My changes are not set in stone. Discussion very welcome
Woot!
Don't get too excited! The code is functional but the documentation still unfinished. I am preparing to split the package apart as discussed in #29
It is merged. The docs are pretty, if not 100% complete, and it has a very intuitive design. Please tag a v0.1.0 at least here. So I can list it as a dependency. Even if you break it down in #29
yes ok, fair enough. Doesn't hurt to have it functional here first before splitting apart.
I'll tag it before the end of this week. Might as well finish rudimentary documentation for all new aspects here.
I just tagged. May take a bit to get into METADATA, as the diff is a bit much to look through.
In general you may be better off to directly depend on MLDataPattern, unless you require DataFrames
Implementation or continuation of #13 #14 #15
The code and tests are now in place and ready for review for anyone interested.
I will leave this open at least until I wrote a basic readthedocs documentation, at which point I will merge unless some things come up.