Verbs, revisited... and much more

tbreloff commented 8 years ago

This has been discussed repeatedly, but it's important to get right if we want widespread adoption. Some references:

https://github.com/Evizero/MLModels.jl/issues/12 https://github.com/Evizero/MLModels.jl/issues/3 https://github.com/JuliaOpt/Optim.jl/pull/87 https://github.com/JuliaStats/Roadmap.jl/issues/15 https://github.com/JuliaStats/Roadmap.jl/issues/4 https://github.com/JuliaStats/Roadmap.jl/issues/20

(there are more linked in those issues, and I'm sure I missed a bunch of good conversations)

I recommend a quick skim over those discussions before commenting, if you can find the time.

What are we supporting?

It's important to remember all the various things we'd like to support with the core abstractions, so we can evaluate when a concept applies and when it doesn't:

Static transformations: log, exp, logit, ...
Aggregations: mean, variance, extrema...
Learnable transformations: regressions, neural nets, decision trees, ...
Compression and dimensionality reduction: PCA, ...
Generative models: distributions, stochastic variables, ...

And there are some opposing perspectives within these classes:

Bayesian vs Frequentist
Batch vs Online
Models producing distributions vs point estimates or classifications

All verbs need not be implemented by all transformations, but when there's potential for overlap, we should do our best to generalize.

Take in inputs, produce outputs

The generalization here is that the object knows how to produce y in y = f(x). This could be the logit function, or a previously fitted linear regression, or a decision tree. Options:

transform
~~predict~~ (taken by StatsBase)
~~map~~ (taken by Base)
apply (deprecated in Base... similar to call)
evaluate
classify (too specific)

I continue to be a fan of transform, with the caveat that we may wish to have the shorthand such that anything that can transform can be called as a functor.

Generate/draw from a generative model

rand
sample
simulate
draw
generate

I think using Base.rand here is generally going to be fine, so I don't think we need this as one of our core verbs.

Use data to change the parameters of a model

learn
~~fit~~ taken by StatsBase
train
update
solve
optimize

I've started leaning towards learn, partially for the symmetry with LearnBase, but also because it is not so actively used in either stats (fit) or ML (train), and so could be argued it's more general.

I think solve/optimize should be reserved for higher-level optimization algorithms, and update could be reserved for lower-level model updating.

Types

I personally feel everything should be a Transformation, though I can see the argument that aggregations, distributions and others don't belong. A mean is a function, but really it's a CenterTransformation that uses a "mean function" to transform data.

Can a transformation take zero inputs? If that's the case, then I could argue a generative model might take zero inputs and generate an output, transforming nothing into something.

If we think of "directed graphs of transformations", then I want to be able to connect a Normal distribution into that graph... we just have the flexibility that the Normal distribution can be a "source" in the same way the input data is a "source".

With this analysis, AbstractTransformation is the core type, and we should make every attempt to avoid new types until we require them to solve a conflict.

Introspection/Traits

There are many things that we could query regarding attributes of our transformations:

does it take input data, or is it a source (i.e. a generative process)?
is it invertible?
can we take a derivative/gradient?
is there a proximal operation? (this is not my strong suit!)
can it be learned?

I would like to see these things eventually implemented as traits, but in the meantime we'll need methods to ask these questions.

Package Layout

I think we agree that LearnBase will contain the core abstractions... enough that someone can create new models/transformations/solvers without importing lots of concrete implementations of things they don't need.

We need homes for concrete implementations of:

ModelLoss (MLModels.jl)
ParameterLoss (MLModels.jl)
StaticTransformation (MLModels.jl and others)
LearnableTransformation (MLModels.jl and others)
Solvers/updaters (StochasticOptimization and DeterministicOptimization?)
StatsBase and existing abstractions

StatsBase contains a ton of assorted methods, types, and algorithms. StatsBase is too big for it to be a dependency of LearnBase (IMO), and LearnBase is too new to expect that StatsBase would depend on it. So I think we should have a package which depends on both LearnBase and StatsBase, and "links" the abstractions together when it's possible/feasible. In some cases this might be as easy as defining things like:

StatsBase.fit!(t::AbstractTransformation, args...; kw...) = LearnBase.learn!(t, args...; kw...)

What are the other packages that we should consider linking with?

cc: @Evizero @ahwillia @joshday @cstjean @andreasnoack @cmcbride @StefanKarpinski @ninjin @simonbyrne @pluskid

(If I forgot to cc someone that you think should be involved, please cc them yourself)

Evizero commented 8 years ago

What does a common base class of Penalty and PredictionLoss achieve? They share no functionality. They cannot be used interchangeably in any setting as their method signatures for value, grad, etc. differs completely.

The only thing I can think about that they do share is signatures for properties like isconvex, but that doesn't really call for a common base class at all.

If you think about the losses in the terms of f(w), which would make them interchangeable with penalties, then we are not thinking about the same abstraction layer of losses, which was the point I was trying to make.

Evizero commented 8 years ago

Alex when you say "losses and penalties (but no models)"... What exactly do you mean by a "model" here? If you mean "empirical risk model" and other standard objective formulas, then I agree (see below). Regarding common transformations (log transform, activation functions, etc) I think it would make life easier to also include them in MLModels. We can bikeshed that name if it helps.

it makes little sense to me to not include the risk model but include the transformations. That would be very arbitrary. EDIT: Arbitrary because that would mean a user would have access to losses and prediction functions but no functionality to connect them.

start a repo for both losses and penalties (but no models)

losses and penalties have little relation really, but I would be ok with that road given that their implementations would actually have a very similar structure. I can appreciate how grouping them would make sense from a programming point of view in terms of maintenance

Sisyphuss commented 8 years ago

The scientific term for this issue is: Structural Risk Minimization, which is to control at the same time the empirical risk and a penalty controlling the VC-dimension of the set of approximation functions.

The first term is a function of both data and parameters, while the second term is a function of parameters alone. This difference may or may not cause problem when the data arrive in stream, in which case, the empirical risk will be updated, but the penalty will not.

Sisyphuss commented 8 years ago

The term loss refers to the loss-function. With a little abuse, it can also indicate the loss of an action (e.g., approximation function) on a single datum or a single random variable.

joshday commented 8 years ago

I'm happy to work on StochasticOptimization.jl with @ahwillia :+1:
Can we keep ObjectiveFunctions.jl in MLModels.jl, if only temporarily? The concrete ObjectiveComponents would be in MLModels.jl, but you wouldn't be able to use them together without ObjectiveFunctions.jl. What's the use case of someone wanting one but not the other?

ahwillia commented 8 years ago

I guess my view is that:

(a) ObjectiveFunctions.jl is a really good name that will be discoverable by users, while it is not immediately clear what would be in a repo called MLModels.jl.
(b) I'm still not sure how we will decide to represent models/transformations/etc. But I do think I know how to implement StochasticOptimization.jl (at least a prototype) that only depends on a repository of loss functions and penalties.

But I suppose you're right that ObjectiveFunctions.jl would need to include some models. E.g. my CachedLeastSquares is an affine predictor followed by a least-squares loss...

tbreloff commented 8 years ago

Can we keep ObjectiveFunctions.jl in MLModels.jl, if only temporarily?

I wonder if ObjectiveFunctions.jl should be the repo which contains both an implementation of ObjectiveFunction and also the ObjectiveComponents (Loss, Penalty, etc). So what is now MLModels would just get renamed to ObjectiveFunctions, and include what would have gone in there.

I think there's good reason to keep transformations in their own repo... MLTransformations.jl is probably a fine name for that. This repo could also collect common transformations like activation functions, affine transformations, rotations, etc. More complex transformations like SVMs, ANNs, etc would live in standalone packages.

StochasticOptimization would presumably depend on both MLTransformations and ObjectiveFunctions, and would define concrete implementations of learn!. Most of the abstract types and method stubs would live in LearnBase (and the whole ecosystem would depend on LearnBase).

my CachedLeastSquares is an affine predictor followed by a least-squares loss...

I'm a little confused where something like this would live. In my mind, this is a "recipe" for a specific combination of "transformation and objective". It's not a core component... it uses core components. So we might want to consider something like MLRecipes.jl (bikeshed please), which can house a bunch of standard approaches, like LASSO, etc, which are just specific combinations of transformations and objective function components.

losses and penalties have little relation really

I disagree. If you think about a an objective funtion as f(t, x, y, w), then you could decompose into objective components:

f(t, x, y, w) = loss(t(x) - y) + λ * penalty(w)

# or more generally:
f(t, x, y, w) = loss(t, x, y, w) + λ * penalty(t, x, y, w)

The difference is just that a penalty does not depend on t/x/y. We don't know or care about this specific distinction when we're building the "objective function" abstraction... we only want to know that it's something that can depend on zero or more of the values (t, x, y, w). That we can ignore t/x/y is just an implementation detail.

Evizero commented 8 years ago

How about SupervisedLosses.jl for the lossfunctions that are currently in MLModels that are a function of f(y, yhat). I feel like they can stand on their own and I don't want to water them down to include something they don't actually need to include (like the data, or the parameters). ObjectiveFunctions.jl could then include penalties as well as the things inspired by CachedLeastSquares (i.e. builds on PredictionLosses) So its content is basically describeable by f(w). Not sure where to put the linear predictor, but I guess the plan was to have a MLTransformations.jl?

If I am the only one who need the empirical/structural risk formalisation (which for some reason appears to be the case) I could make a separate, independent package that connects the linear model, loss functions, and penalties in a convenient way for my SVM use cases

Evizero commented 8 years ago

@tbreloff If we do want to describe an objective function in such a way (which I have no opinion on yet) that doesn't mean that we need to poison the low level implementation of the losses. it would be much cleaner to realize this as a separate abstraction layer that builds on a clean implementation of the losses.

Evizero commented 8 years ago

for example (little verbose but makes the point):

type LossObjectiveComponent{TLoss <: Loss} <: ObjectiveComponent
    loss::TLoss
end

This way we don't discard years of theory for a particular implementation design. I don't see a downside here. The Losses can be used independently of the rest of our framework (for example in a course exercise where students need to implement gradient descent and should investigate the influence of different loss functions on the learning behaviour etc)

joshday commented 8 years ago

If I am the only one who need the empirical/structural risk

Really the only parts of JuliaML that OnlineStats/SparseRegression needs is a StructuralRisk type. I think this definitely needs to live somewhere in JuliaML.

Evizero commented 8 years ago

@tbreloff Thinking about this a little more. A different way to meet both our goals would be to revisit the type tree of the Loss that we previously cut as a compromise to your distaste for type trees.

abstract Cost
abstract Loss <: Cost
abstract SupervisedLoss <: Loss

in the theory Cost would be kinda equivalent to your ObjectiveComponent so we could call it such

Cost is a function f(X, y, c) where c can be anything
Loss is a function f(X, y, g(X)), so c is some function of the data
SupervisedLoss doesn't need X, so its a function f(y, g(X))
UnsupervisedLoss doesn't need y, so its a function f(X, g(X))

I would agree to replace either Cost or Loss with ObjectiveComponent, as long as the current losses could share a common baseclass SupervisedLoss which takes care of ommiting X

tbreloff commented 8 years ago

@Evizero I can get behind a separate repo for SupervisedLosses.jl, if you feel strongly. The important part for me is that, when someone wants to use both "generalized objective functions" with a "supervised loss component", that they have the generalized version available. I think the way to make both of us happy is:

# in SupervisedLosses.jl:

import LearnBase: ObjectiveComponent

abstract Loss <: ObjectiveComponent
    abstract DistanceLoss <: Loss
    abstract MarginLoss <: Loss

value(l::Loss, y, yhat) = ...
deriv(l::Loss, y, yhat) = ...

# in ObjectiveFunctions.jl:

using LearnBase, SupervisedLosses

# NOTE: these could possibly dispatch on a TRAIT of the ObjectiveComponent, not its type
value(l::Loss, t::Transformation, x::InputData, y::TargetData, w::Parameters) = value(l, t(x), y)
deriv(l::Loss, t::Transformation, x::InputData, y::TargetData, w::Parameters) = deriv(l, t(x), y)

... other generic conversions and methods ...

Evizero commented 8 years ago

@tbreloff I could live with that if we rename the Loss in your snippet to SupervisedLoss, which would be more appropriate and leave the future option to implement unsupervised losses as well.

I do not think the last two functions will work out the way you hope they would. The EmpiricalRisk type exists for a good reason (to be fair that should be called structural risk). Think about array input. The user may want to preallocate storage. Things get pretty nasty pretty quick that way if performance and memory footprint are of importance. That said, I am guessing this snipped is just a quick hack to make a point, to which I would say: I am ok with that general approach.

tbreloff commented 8 years ago

@Evizero I'm happy with this type tree:

abstract Cost
    abstract Loss <: Cost
        abstract SupervisedLoss <: Loss
        abstract UnsupervisedLoss <: Loss
    abstract Penalty <: Cost
    abstract DiscountedFutureRewards <: Cost

Does that work for you? I think this should be defined in LearnBase, with concrete implementations elsewhere.

Question: instead of SupervisedLosses.jl, could we call it Losses.jl (or MLLosses), and leave open the potential for including unsupervised losses?

Evizero commented 8 years ago

Another issue with that proposed function signature for value is that often algorithms need access to the prediction yhat to compute stuff like subgradients (example: https://github.com/Evizero/KSVM.jl/blob/master/src/linear/solver/pegasos.jl#L110).

My point being that it is not a good idea to hide computation like this behind a high level layer that cannot be torn down. We need to allow for low level access to things, which is one reason why I am so painfully persistent with the whole Loss topic

Evizero commented 8 years ago

MLLosses.jl for consistency? I guess the double L is weird. I am ok with Losses.jl as well

tbreloff commented 8 years ago

often algorithms need access to the prediction yhat to compute stuff like subgradients example

I think the real implementation would do this in stages:

value(l, yhat, y, w) = ...
value(l, t, x, y, w) = value(l, t(x), y, w)

So that you could plug in to the most specific one you need.

I guess the double L is weird

I wrote that first, and hated the double-L.

tbreloff commented 8 years ago

Also a real implementation would have corresponding mutating versions:

value!(a, ...) = (a[:] = ...)

but developed with an eye towards whatever performance optimizations someone might want.

Evizero commented 8 years ago

I guess Losses.jl would be self-describing enough to not be confused with other areas?

tbreloff commented 8 years ago

Just a quick thought... might be cool to have a repo LearnLab.jl which just re-exports a bunch of the JuliaML ecosystem: losses, optimizers, transformations, etc. This is what we'd point beginners/users to, and they could later dig into the individual components.

I'm rebuilding the JuliaML.github.io website, so I was thinking if there was a good repo to have as the "public face" of the JuliaML ecosystem.

With regards to the website, I'm going to do it in a very similar way to Plots, and mainly just have an intro page, and a "design" page, where we can keep the currently agreed structure and abstractions. I don't think we need to announce the site... it'll just be nice to have a well laid out place to review our current progress/goals. Sometimes it can be tricky to piece things together from a discussion like this.

ahwillia commented 8 years ago

+1 for Losses.jl (it is shorter and will be easier to find for new users)

I also like the type hierarchy. Who wants to pull the trigger on creating the repos? On Jun 30, 2016 9:10 AM, "Tom Breloff" notifications@github.com wrote:

Just a quick thought... might be cool to have a repo LearnLab.jl which just re-exports a bunch of the JuliaML ecosystem: losses, optimizers, transformations, etc. This is what we'd point beginners/users to, and they could later dig into the individual components.

I'm rebuilding the JuliaML.github.io website, so I was thinking if there was a good repo to have as the "public face" of the JuliaML ecosystem.

With regards to the website, I'm going to do it in a very similar way to Plots, and mainly just have an intro page, and a "design" page, where we can keep the currently agreed structure and abstractions. I don't think we need to announce the site... it'll just be nice to have a well laid out place to review our current progress/goals. Sometimes it can be tricky to piece things together from a discussion like this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/8#issuecomment-229707851, or mute the thread https://github.com/notifications/unsubscribe/AAm20U_ajZNupmEoacxWwWg3ySEHuEgfks5qQ-p0gaJpZM4I_eeP .

tbreloff commented 8 years ago

Great... I'll take care of the repos. Do we want to drop the ML prefix across the board? When we register there might be more discussion about naming, but it would be strange to have the ML prefix for only a subset.

@Evizero Do you still want to move over Evizero/MLModels? Or should I just create new, blank repos and we can copy code to wherever it makes sense?

Here's what I think we need, based on discussions:

LearnBase.jl
Losses.jl (@Evizero will move MLModels.jl here. implementations of Loss abstractions... formerly SupervisedLosses.jl)
ObjectiveFunctions.jl (implementations of ObjectiveFunction, Penalty, and generalized versions of Losses)
Transformations.jl (static transforms, activation functions, and similar... possibly generic directed graphs of sub-transformations as well?)
StochasticOptimization.jl (Adagrad, ADAM, etc plus generic online optimizers implementing LearningAlgorithm)
MLRisk.jl (empirical risk minimization and other similar "recipes")
LearnLab.jl (bundling repo, re-exporting lots of the ecosystem for easy install/setup)

At some later date we can get the Bayesians and others to discuss how we can get their models into the ecosystem.

With enough :+1: I'll create these and write up some docs describing it all.

joshday commented 8 years ago

I think we should drop the ML. I think it would likely cause confusion since MLBase exists in JuliaStats.

Evizero commented 8 years ago

@tbreloff https://github.com/Evizero/MLModels.jl will turn into Losses.jl, since this was its main focus from the beginning anyway. The bits for ObjectiveFunctions.jl we either move there or reimplement.

@joshday MLBase is just one single repo. I don't think there is a good case to make to drop the ML from MLDataUtils, MLKernels (which is outside our influence anyway), MLMetrics.

I suggest we follow the following guideline to naming

think of most appropriate name
If not ambiguous with other fields take name from 1.
otherwise add prefix ML

Evizero commented 8 years ago

@tbreloff in your list of repositories, where would the linear predictor / linear model / linear transformation live?

tbreloff commented 8 years ago

I just added Transformations... did we decide on where something like empirical risk would live? Maybe MLRecipes or something similar?

Evizero commented 8 years ago

I just added Transformations... did we decide on where something like empirical risk would live? Maybe MLRecipes or something similar?

Isn't ObjectiveFunctions.jl intended to be a substitution for it? If not (or regardless?) let's make a MLRisks.jl package which focuses purely on the Empirical- and Sturctural Risk minimization approach.

tbreloff commented 8 years ago

Isn't ObjectiveFunctions.jl intended to be a substitution for it?

No... I was thinking ObjectiveFunctions and Transformations would be components of a LearningAlgorithm, and that they'd be separate.

let's make a MLRisks.jl package

:+1:

Evizero commented 8 years ago

Alright then. This seems like a good first step to act on. I'll take care of Losses.jl now (and MLRisks.jl later), why don't you go ahead create the others.

tbreloff commented 8 years ago

Ok I'll wait a little bit to hear out any objections, then I'll create:

ObjectiveFunctions.jl
Transformations.jl
LearnLab.jl

Already created:

LearnBase.jl
StochasticOptimization.jl

@Evizero will move/create:

Losses.jl (moved from Evizero/MLModels.jl)
MLRisk.jl

tbreloff commented 8 years ago

All... please also let me know which packages you want to be involved in, as well as if you want to be a "team lead" (or co-lead) for that package, so we all know who's organizing efforts.

ahwillia commented 8 years ago

I'll try to focus on StochasticOptimization, but I imagine this will involve me mucking around with everything else since it depends on basically everything else.

tbreloff commented 8 years ago

@ahwillia understood. I think you and @joshday should be co-leads of that (if you're both up for it), and I hope you'll be active in other stuff when it makes sense.

Evizero commented 8 years ago

I am sure I will creep around in all said packages a lot.

In terms of Lead. Of core interest to me is Losses.jl (which is pretty much fully functional and well tested) and MLRisks.jl (which I have most of the code for and just need to wait until Penalties and Predictors are provided). Once those are in place I will finally direct my gaze back to KSVM :)

Since @joshday mentioned that structural risks are important to him I would hope to co-author MLRisks.jl with him to suite both our needs perfectly.

tbreloff commented 8 years ago

Awesome thanks @Evizero. And I hope you have lots of time for us!

As for me, I think I'd like to lead/co-lead ObjectiveFunctions.jl and Transformations.jl, but of course I want to be involved with everything in some capacity.

cmcbride commented 8 years ago

I'll just pipe in that I'm excited about this work and have been lurking, but actively following, this discussion.

I'm not sure about the LearnLab.jl name, but I like the unifying concept as convenience to loading the ecosystem of the various isolated topics. I don't have any better suggestions yet, though (I've thought of several worse ones).

Evizero commented 8 years ago

and older idea from @tbreloff was MLWorkBench.jl

I'd like to throw some in the mix I can think of

MLEcosystem.jl
MLTools.jl
JuliaML.jl
MLEnvironment.jl

tbreloff commented 8 years ago

keep it going:

LearnTools.jl
LearnBench.jl
YouAreATool.jl

tbreloff commented 8 years ago

What about just Learn.jl or Learning.jl. It has nice symmetry with LearnBase.

ahwillia commented 8 years ago

+1 to MLWorkbench.jl though I think we should table that discussion until everything becomes more mature.

I want to put prox ops for penalty functions somewhere over the next day or two. Is this going in ObjectiveFunctions.jl and not Losses.jl? I was under the impression that penalties would go in Losses...

On Thu, Jun 30, 2016 at 11:12 AM, Tom Breloff notifications@github.com wrote:

What about just Learn.jl or Learning.jl. It has nice symmetry with LearnBase.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/8#issuecomment-229742674, or mute the thread https://github.com/notifications/unsubscribe/AAm20Z-tP2CPLb1-M7POadRcLyPKa1lzks5qRAbygaJpZM4I_eeP .

Evizero commented 8 years ago

+1 to MLWorkbench.jl though I think we should table that discussion until everything becomes more mature.

agreed

I was under the impression that penalties would go in Losses.

it makes more sense to put the penalties in the same please where the CachedLeastSquares substitution also lives.

tbreloff commented 8 years ago

I think we should table that discussion until everything becomes more mature.

I want to make a placeholder, and literally just add most of JuliaML to the REQUIRE file and re-export the packages. It would be nothing to maintain, but it would be the easy way for new users to get going without looking at each individual package. I think we should do it today, and mostly forget about it after (since we wouldn't need to change much there). I lean towards Learn.jl right now, but if everyone prefers MLWorkBench I can be persuaded.

put the penalties in the same please where the CachedLeastSquares substitution also lives.

I'm still confused about this... should CachedLeastSquares live in MLRisk? If so, I don't think penalties should live there... they are too core.

Evizero commented 8 years ago

I'm still confused about this... should CachedLeastSquares live in MLRisk?

no, ObjectiveFunctions

Evizero commented 8 years ago

I'm ok with Learn.jl

tbreloff commented 8 years ago

I'm still confused about this... should CachedLeastSquares live in MLRisk?

no, ObjectiveFunctions

Ok I think I understand now. Yes all penalties would go in ObjectiveFunctions then (which depends on both LearnBase and Losses)

Evizero commented 8 years ago

Another reason why I would like penalties not live in Losses.jl is because they are two different topics really that share no code. I lean more towards the unix philosophy

tbreloff commented 8 years ago

There is a stub repo here: https://github.com/Rory-Finnegan/Learn.jl

@Rory-Finnegan... do you want to be involved with JuliaML? Are you ok with us using the Learn.jl name?

rofinn commented 8 years ago

@tbreloff I would like to be involved with JuliaML and feel free to take the Learn.jl name. Let me know if you need me to delete my repo.

tbreloff commented 8 years ago

@Rory-Finnegan: Awesome! If you haven't already, please read through the discussions and let us know how you'd most like to contribute. I assume you haven't published Learn.jl to METADATA, right? If not, then it doesn't matter what you do with your Learn.jl, though it might be good to either put a big notification which links here, or just remove it if you don't care about what's there. It's totally up to you what you want to do.

JuliaML / META