Delegation to other packages

ahwillia commented 8 years ago

Just had a nice lunch with @tbreloff @joshday and @jmxpearson where we discussed having a stand-alone package for online optimization (stochastic gradient descent, ADAM, etc.). I hadn't realized so much had been implemented already in LearnBase. I'm opening this mostly to start a discussion where I can get caught up to speed before I go off and spend time re-implementing the work done here. This issue of course dovetails off previous conversations, but I felt it more appropriate to start a new thread.

First off, great work -- I really like the motivation for this package and it is beautifully organized. But I'm a bit confused as to what the scope of this project is and how I can contribute and work synergistically. Does this package just specify and interface or set of functions that researchers will import and extend in their own packages? Or will the actual code for specifying and optimizing various models all live here?

Specific comments/questions:

How can the scope of this project be explained more clearly to new users? What is the 15 second explanation? (Also see https://github.com/Evizero/LearnBase.jl/issues/2)
I am a bit worried that functionality will be buried and hidden from users since this package aims to be very general. I have been interested in a simple SGD implementation, but didn't realize it was available here.
In general, under what conditions will LearnBase delegate responsibilities to other packages? In my view, an appropriate amount of delegation would make it easier to maintain and extend packages as new bugs/issues arise and as research advances and techniques change.
I am happy to push forward on making some optimization packages that play well with the models/interface here. How should I go about doing this?

tbreloff commented 8 years ago

@ahwillia I think this is all good food for thought, and I'm excited to push the project forward into a usable state that is good for everyone involved.

@Evizero Is this a good time to revisit moving LearnBase into JuliaML? I know you've said in the past that you intend to move it, but I don't want to pressure you into anything. It's your repo, and your choice.

Evizero commented 8 years ago

I think it is fair to say that the scope evolved over time just as we did. Initially I set out to create a generic base package for all things loss functions that was motivated by my SVM package (because other Loss function implementations were not generic or flexible enough). This has since then mutated to the goal of trying to scope out some common ground functionality that could be a basis for ML in julia. I still hold the believe that julia is different enough to challenge common design decisions that were made in other popular machine learning frameworks (e.g. scikit learn). So LearnBase has been a playground for me to wrestle with ideas and collaborate/brainstorm with others to gravitate towards a common vision.

As it stands now I think the most fruitful outcome would be if LearnBase slimmed down to a set of common functions (like predict, transform...) and loss definitions that would serve as basis for concrete ML packages. Concrete optimization algorithm should probably belong into their own respective packages, but that depends on user adoption. So I am much in favour of delegating functionality into their own respective packages. The closest I have come to actually realize any part of this vision is with MLDataUtils.

Another big hope I carry is that we don't reinvent the wheel on things over and over. For example I would like to see a way that Optim.jl can be used to optimize some risk functional.

@tbreloff Yes, we should move sooner than later. I recognize that I am holding this package back by wanting to fix some issues before moving, and yet I don't actually get around to fix those issues. Let's do the following: Give me a buffer of 10 days to make the changes I want to make (remove Unicodeplots, move encoding, ...). If I don't get around to it until then I shall move the package as is.

joshday commented 8 years ago

+1. I think these simple function names are the most important part of a Base ML package.

As it stands now I think the most fruitful outcome would be if LearnBase slimmed down to a set of common functions (like predict, transform...) and loss definitions that would serve as basis for concrete ML packages. Concrete optimization algorithm should probably belong into their own respective packages, but that depends on user adoption.

tbreloff commented 8 years ago

We had a short discussion here, and I think we agree on the following points:

LearnBase should be slimmed down to only the abstractions, and methods which act on the abstract types
LearnBase should define/export the "verbs"
Concrete types (the loss models, etc) and the corresponding methods should be moved into one or more sub-packages
ParameterUpdater concrete types and any "solvers" could move into a new repo (StochasticOptimization?)
ScikitLearn would depend on both ScikitLearnBase and LearnBase and provide link code between the abstractions

@Evizero wanna chat in a few minutes about these points? skype?

tbreloff commented 8 years ago

https://github.com/JuliaML/StochasticOptimization.jl

joshday commented 8 years ago

Evizero commented 8 years ago

To follow up. We settled on renaming to MLModels, but given that it won't include things like trees I wonder if MLGradientModels would be a more appropriate and descriptive name?

tbreloff commented 8 years ago

I think trees can still fit in this abstraction... as a "learnable transformation"? (As in the whole tree is one giant transformation)

Evizero commented 8 years ago

yes, but unlike the focus of the package it is not a combination of ModelLoss, ParameterLoss, and Transformation (btw: should we call it Mapping or Predictor or Activation instead?)

tbreloff commented 8 years ago

MLModels will contain losses and transformations, as well as some methods which dispatch on combinations of these components. A tree is just another transformation, and there's no reason we can't define additional losses and transformations outside of MLModels.

(btw: should we call it Mapping or Predictor or Activation instead?)

I like the symmetry that we call transform on a Transformation. map is taken, predict and activate are too specific. Any other opinions?

Evizero commented 8 years ago

mhm maybe, but somehow it seems to me like trees would belong into their own respective package (where the learning algorithm is located as well). Intuitively I think it would make most sense if this package just focused on ModelLosses ParameterLosses and connecting those to coefficient-based transformations.

tbreloff commented 8 years ago

trees would belong into their own respective package

agreed

make most sense if this package just focused on ModelLosses ParameterLosses and connecting those to coefficient-based transformations.

agreed

I think MLModels is a perfectly fine name... we can always have additional packages later: TreeModels, etc

JuliaML / LossFunctions.jl

Delegation to other packages #22