Open ararslan opened 7 years ago
FWIW, I'd rather have lots of small packages (e.g., Classification.jl, CrossValidation.jl, Bootstrap.jl, ModelTuning.jl) that remain outside StatsBase since they are somewhat specific techniques and problem spaces.
I agree these features sound broader than machine learning, but I'm not sure whether they should live in StatsBase or in separate packages. I guess it depends on whether each package offering a new kind of model will have to override some functions (and therefore depend on the package providing them) or not. Ideally a common interface would live in StatsBase and e.g. Bootstrap.jl would only use these functions to automatically support bootstrap for any model.
Ideally a common interface would live in StatsBase and e.g. Bootstrap.jl would only use these functions to automatically support bootstrap for any model.
Yeah, that's what I was thinking. I figured StatsBase could have a simple Resample
interface that could be supported for bootstrapping, cross-validation, jackknifing, etc.
Also, might be worth contacting the JuliaML folks as the features here have some overlap with their packages (e.g., MLDataPattern.jl)
There is indeed overlap with JuliaML/LearnBase.jl in purpose at least, if not in naming. @Evizero
I don't think I have anything insightful or useful to contribute to this conversation. Maybe a good course of action is to give whoever wants to dedicate time and effort into this package some flexibility to do so
This package on its own is not all that discoverable, plus a lot of the methodology is also relevant to "classical" statistics, not just to machine learning (e.g. cross validation, classification, etc.). Thoughts?
cc @nalimilan