Closed svs14 closed 10 years ago
I have also been thinking about this.
For the 3 cross validation schemes currently implemented, it would be useful for KFold
and RandomSub
.
Paralleling scikit-learn's, StratifiedKFold
and StratifiedShuffleSplit
, we could have StratifiedKFold(strata, k)
and StratifiedRandomSub(strata, sn, k)
where unique values of strata
indicate the stratum of that observation.
If no one objects to that API, I'll work on an implementation.
That would be great to have.
Implemented in https://github.com/JuliaStats/MLBase.jl/pull/6.
Thanks! :D
Great work with MLBase, proves very helpful!
Are there any thoughts on implementing stratified sampling in the future?