JuliaStats / MLBase.jl

A set of functions to support the development of machine learning algorithms
MIT License
185 stars 63 forks source link

Stratified Sampling #3

Closed svs14 closed 10 years ago

svs14 commented 10 years ago

Great work with MLBase, proves very helpful!

Are there any thoughts on implementing stratified sampling in the future?

lendle commented 10 years ago

I have also been thinking about this.

For the 3 cross validation schemes currently implemented, it would be useful for KFold and RandomSub.

Paralleling scikit-learn's, StratifiedKFold and StratifiedShuffleSplit, we could have StratifiedKFold(strata, k) and StratifiedRandomSub(strata, sn, k) where unique values of strata indicate the stratum of that observation.

If no one objects to that API, I'll work on an implementation.

lindahua commented 10 years ago

That would be great to have.

lindahua commented 10 years ago

Implemented in https://github.com/JuliaStats/MLBase.jl/pull/6.

svs14 commented 10 years ago

Thanks! :D