JuliaStats / MLBase.jl

A set of functions to support the development of machine learning algorithms
MIT License
186 stars 63 forks source link

Merge with StatsBase? #38

Open ararslan opened 7 years ago

ararslan commented 7 years ago

This package on its own is not all that discoverable, plus a lot of the methodology is also relevant to "classical" statistics, not just to machine learning (e.g. cross validation, classification, etc.). Thoughts?

cc @nalimilan

rofinn commented 7 years ago

FWIW, I'd rather have lots of small packages (e.g., Classification.jl, CrossValidation.jl, Bootstrap.jl, ModelTuning.jl) that remain outside StatsBase since they are somewhat specific techniques and problem spaces.

nalimilan commented 7 years ago

I agree these features sound broader than machine learning, but I'm not sure whether they should live in StatsBase or in separate packages. I guess it depends on whether each package offering a new kind of model will have to override some functions (and therefore depend on the package providing them) or not. Ideally a common interface would live in StatsBase and e.g. Bootstrap.jl would only use these functions to automatically support bootstrap for any model.

rofinn commented 7 years ago

Ideally a common interface would live in StatsBase and e.g. Bootstrap.jl would only use these functions to automatically support bootstrap for any model.

Yeah, that's what I was thinking. I figured StatsBase could have a simple Resample interface that could be supported for bootstrapping, cross-validation, jackknifing, etc.

rofinn commented 7 years ago

Also, might be worth contacting the JuliaML folks as the features here have some overlap with their packages (e.g., MLDataPattern.jl)

oxinabox commented 7 years ago

There is indeed overlap with JuliaML/LearnBase.jl in purpose at least, if not in naming. @Evizero

Evizero commented 7 years ago

I don't think I have anything insightful or useful to contribute to this conversation. Maybe a good course of action is to give whoever wants to dedicate time and effort into this package some flexibility to do so