JuliaML / MLDataUtils.jl

Utility package for generating, loading, splitting, and processing Machine Learning datasets
http://mldatautilsjl.readthedocs.io/
Other
102 stars 20 forks source link

Should allow multifactor labels to be used in sampling #28

Closed oxinabox closed 7 years ago

oxinabox commented 7 years ago

In particular, this required to allow useful use of the targetfun.

Added more tests to that effect

Evizero commented 7 years ago

I am unsure about this. It looks to me as if this change simply hides an implicit use of obsview. I tend to think we should require a user to be specific about it and instead call

# EDIT: nope
oversample(obsview(src); targetfun=x->x[1]>x[2])

The implicit obsview seems scary because then the result looks very different than the input

Edit: misread the code

Evizero commented 7 years ago

Or alternatively

julia> oversample(src; targetfun=x->x[1,:].>x[2,:])
2×2578 SubArray{Int64,2,Array{Int64,2},Tuple{Colon,Array{Int64,1}},false}:
 1  3  4  4  4  2  2  2  2  2  4  3  …  2  3  4  3  1  3  3  4  4  4  2  4
 3  1  3  3  2  2  4  3  3  3  3  4     4  3  2  4  4  2  3  3  2  3  2  2
Evizero commented 7 years ago

Took me a while but now I think I understand what is going on and I think I like it. This could be a really good change.

I have to think about it a little to wrap my head around. For example,

Evizero commented 7 years ago

Lets go with your change. I'll experiment with it locally