Open tbreloff opened 8 years ago
What does a common base class of Penalty
and PredictionLoss
achieve? They share no functionality. They cannot be used interchangeably in any setting as their method signatures for value
, grad
, etc. differs completely.
The only thing I can think about that they do share is signatures for properties like isconvex
, but that doesn't really call for a common base class at all.
If you think about the losses in the terms of f(w)
, which would make them interchangeable with penalties, then we are not thinking about the same abstraction layer of losses, which was the point I was trying to make.
Alex when you say "losses and penalties (but no models)"... What exactly do you mean by a "model" here? If you mean "empirical risk model" and other standard objective formulas, then I agree (see below). Regarding common transformations (log transform, activation functions, etc) I think it would make life easier to also include them in MLModels. We can bikeshed that name if it helps.
it makes little sense to me to not include the risk model but include the transformations. That would be very arbitrary. EDIT: Arbitrary because that would mean a user would have access to losses and prediction functions but no functionality to connect them.
start a repo for both losses and penalties (but no models)
losses and penalties have little relation really, but I would be ok with that road given that their implementations would actually have a very similar structure. I can appreciate how grouping them would make sense from a programming point of view in terms of maintenance
The scientific term for this issue is: Structural Risk Minimization, which is to control at the same time the empirical risk and a penalty controlling the VC-dimension of the set of approximation functions.
The first term is a function of both data and parameters, while the second term is a function of parameters alone. This difference may or may not cause problem when the data arrive in stream, in which case, the empirical risk will be updated, but the penalty will not.
The term loss refers to the loss-function. With a little abuse, it can also indicate the loss of an action (e.g., approximation function) on a single datum or a single random variable.
I guess my view is that:
ObjectiveFunctions.jl
is a really good name that will be discoverable by users, while it is not immediately clear what would be in a repo called MLModels.jl
.StochasticOptimization.jl
(at least a prototype) that only depends on a repository of loss functions and penalties.But I suppose you're right that ObjectiveFunctions.jl would need to include some models. E.g. my CachedLeastSquares
is an affine predictor followed by a least-squares loss...
Can we keep ObjectiveFunctions.jl in MLModels.jl, if only temporarily?
I wonder if ObjectiveFunctions.jl should be the repo which contains both an implementation of ObjectiveFunction
and also the ObjectiveComponent
s (Loss
, Penalty
, etc). So what is now MLModels would just get renamed to ObjectiveFunctions, and include what would have gone in there.
I think there's good reason to keep transformations in their own repo... MLTransformations.jl is probably a fine name for that. This repo could also collect common transformations like activation functions, affine transformations, rotations, etc. More complex transformations like SVMs, ANNs, etc would live in standalone packages.
StochasticOptimization would presumably depend on both MLTransformations and ObjectiveFunctions, and would define concrete implementations of learn!
. Most of the abstract types and method stubs would live in LearnBase (and the whole ecosystem would depend on LearnBase).
my CachedLeastSquares is an affine predictor followed by a least-squares loss...
I'm a little confused where something like this would live. In my mind, this is a "recipe" for a specific combination of "transformation and objective". It's not a core component... it uses core components. So we might want to consider something like MLRecipes.jl (bikeshed please), which can house a bunch of standard approaches, like LASSO, etc, which are just specific combinations of transformations and objective function components.
losses and penalties have little relation really
I disagree. If you think about a an objective funtion as f(t, x, y, w)
, then you could decompose into objective components:
f(t, x, y, w) = loss(t(x) - y) + λ * penalty(w)
# or more generally:
f(t, x, y, w) = loss(t, x, y, w) + λ * penalty(t, x, y, w)
The difference is just that a penalty does not depend on t/x/y. We don't know or care about this specific distinction when we're building the "objective function" abstraction... we only want to know that it's something that can depend on zero or more of the values (t, x, y, w)
. That we can ignore t/x/y is just an implementation detail.
How about SupervisedLosses.jl
for the lossfunctions that are currently in MLModels that are a function of f(y, yhat)
. I feel like they can stand on their own and I don't want to water them down to include something they don't actually need to include (like the data, or the parameters). ObjectiveFunctions.jl
could then include penalties as well as the things inspired by CachedLeastSquares
(i.e. builds on PredictionLosses) So its content is basically describeable by f(w)
. Not sure where to put the linear predictor, but I guess the plan was to have a MLTransformations.jl
?
If I am the only one who need the empirical/structural risk formalisation (which for some reason appears to be the case) I could make a separate, independent package that connects the linear model, loss functions, and penalties in a convenient way for my SVM use cases
@tbreloff If we do want to describe an objective function in such a way (which I have no opinion on yet) that doesn't mean that we need to poison the low level implementation of the losses. it would be much cleaner to realize this as a separate abstraction layer that builds on a clean implementation of the losses.
for example (little verbose but makes the point):
type LossObjectiveComponent{TLoss <: Loss} <: ObjectiveComponent
loss::TLoss
end
This way we don't discard years of theory for a particular implementation design. I don't see a downside here. The Losses can be used independently of the rest of our framework (for example in a course exercise where students need to implement gradient descent and should investigate the influence of different loss functions on the learning behaviour etc)
If I am the only one who need the empirical/structural risk
Really the only parts of JuliaML that OnlineStats/SparseRegression needs is a StructuralRisk type. I think this definitely needs to live somewhere in JuliaML.
@tbreloff Thinking about this a little more. A different way to meet both our goals would be to revisit the type tree of the Loss that we previously cut as a compromise to your distaste for type trees.
abstract Cost
abstract Loss <: Cost
abstract SupervisedLoss <: Loss
in the theory Cost
would be kinda equivalent to your ObjectiveComponent
so we could call it such
Cost
is a function f(X, y, c)
where c
can be anythingLoss
is a function f(X, y, g(X))
, so c
is some function of the data SupervisedLoss
doesn't need X
, so its a function f(y, g(X))
UnsupervisedLoss
doesn't need y
, so its a function f(X, g(X))
I would agree to replace either Cost
or Loss
with ObjectiveComponent
, as long as the current losses could share a common baseclass SupervisedLoss
which takes care of ommiting X
@Evizero I can get behind a separate repo for SupervisedLosses.jl, if you feel strongly. The important part for me is that, when someone wants to use both "generalized objective functions" with a "supervised loss component", that they have the generalized version available. I think the way to make both of us happy is:
# in SupervisedLosses.jl:
import LearnBase: ObjectiveComponent
abstract Loss <: ObjectiveComponent
abstract DistanceLoss <: Loss
abstract MarginLoss <: Loss
value(l::Loss, y, yhat) = ...
deriv(l::Loss, y, yhat) = ...
# in ObjectiveFunctions.jl:
using LearnBase, SupervisedLosses
# NOTE: these could possibly dispatch on a TRAIT of the ObjectiveComponent, not its type
value(l::Loss, t::Transformation, x::InputData, y::TargetData, w::Parameters) = value(l, t(x), y)
deriv(l::Loss, t::Transformation, x::InputData, y::TargetData, w::Parameters) = deriv(l, t(x), y)
... other generic conversions and methods ...
@tbreloff I could live with that if we rename the Loss
in your snippet to SupervisedLoss
, which would be more appropriate and leave the future option to implement unsupervised losses as well.
I do not think the last two functions will work out the way you hope they would. The EmpiricalRisk
type exists for a good reason (to be fair that should be called structural risk). Think about array input. The user may want to preallocate storage. Things get pretty nasty pretty quick that way if performance and memory footprint are of importance. That said, I am guessing this snipped is just a quick hack to make a point, to which I would say: I am ok with that general approach.
@Evizero I'm happy with this type tree:
abstract Cost
abstract Loss <: Cost
abstract SupervisedLoss <: Loss
abstract UnsupervisedLoss <: Loss
abstract Penalty <: Cost
abstract DiscountedFutureRewards <: Cost
Does that work for you? I think this should be defined in LearnBase, with concrete implementations elsewhere.
Question: instead of SupervisedLosses.jl, could we call it Losses.jl (or MLLosses), and leave open the potential for including unsupervised losses?
Another issue with that proposed function signature for value
is that often algorithms need access to the prediction yhat
to compute stuff like subgradients (example: https://github.com/Evizero/KSVM.jl/blob/master/src/linear/solver/pegasos.jl#L110).
My point being that it is not a good idea to hide computation like this behind a high level layer that cannot be torn down. We need to allow for low level access to things, which is one reason why I am so painfully persistent with the whole Loss topic
MLLosses.jl
for consistency? I guess the double L is weird. I am ok with Losses.jl
as well
often algorithms need access to the prediction yhat to compute stuff like subgradients example
I think the real implementation would do this in stages:
value(l, yhat, y, w) = ...
value(l, t, x, y, w) = value(l, t(x), y, w)
So that you could plug in to the most specific one you need.
I guess the double L is weird
I wrote that first, and hated the double-L.
Also a real implementation would have corresponding mutating versions:
value!(a, ...) = (a[:] = ...)
but developed with an eye towards whatever performance optimizations someone might want.
I guess Losses.jl
would be self-describing enough to not be confused with other areas?
Just a quick thought... might be cool to have a repo LearnLab.jl which just re-exports a bunch of the JuliaML ecosystem: losses, optimizers, transformations, etc. This is what we'd point beginners/users to, and they could later dig into the individual components.
I'm rebuilding the JuliaML.github.io website, so I was thinking if there was a good repo to have as the "public face" of the JuliaML ecosystem.
With regards to the website, I'm going to do it in a very similar way to Plots, and mainly just have an intro page, and a "design" page, where we can keep the currently agreed structure and abstractions. I don't think we need to announce the site... it'll just be nice to have a well laid out place to review our current progress/goals. Sometimes it can be tricky to piece things together from a discussion like this.
+1 for Losses.jl (it is shorter and will be easier to find for new users)
I also like the type hierarchy. Who wants to pull the trigger on creating the repos? On Jun 30, 2016 9:10 AM, "Tom Breloff" notifications@github.com wrote:
Just a quick thought... might be cool to have a repo LearnLab.jl which just re-exports a bunch of the JuliaML ecosystem: losses, optimizers, transformations, etc. This is what we'd point beginners/users to, and they could later dig into the individual components.
I'm rebuilding the JuliaML.github.io website, so I was thinking if there was a good repo to have as the "public face" of the JuliaML ecosystem.
With regards to the website, I'm going to do it in a very similar way to Plots, and mainly just have an intro page, and a "design" page, where we can keep the currently agreed structure and abstractions. I don't think we need to announce the site... it'll just be nice to have a well laid out place to review our current progress/goals. Sometimes it can be tricky to piece things together from a discussion like this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/8#issuecomment-229707851, or mute the thread https://github.com/notifications/unsubscribe/AAm20U_ajZNupmEoacxWwWg3ySEHuEgfks5qQ-p0gaJpZM4I_eeP .
Great... I'll take care of the repos. Do we want to drop the ML prefix across the board? When we register there might be more discussion about naming, but it would be strange to have the ML prefix for only a subset.
@Evizero Do you still want to move over Evizero/MLModels? Or should I just create new, blank repos and we can copy code to wherever it makes sense?
Here's what I think we need, based on discussions:
At some later date we can get the Bayesians and others to discuss how we can get their models into the ecosystem.
With enough :+1: I'll create these and write up some docs describing it all.
I think we should drop the ML. I think it would likely cause confusion since MLBase exists in JuliaStats.
@tbreloff https://github.com/Evizero/MLModels.jl will turn into Losses.jl
, since this was its main focus from the beginning anyway. The bits for ObjectiveFunctions.jl
we either move there or reimplement.
@joshday MLBase is just one single repo. I don't think there is a good case to make to drop the ML from MLDataUtils, MLKernels (which is outside our influence anyway), MLMetrics.
I suggest we follow the following guideline to naming
@tbreloff in your list of repositories, where would the linear predictor / linear model / linear transformation live?
I just added Transformations... did we decide on where something like empirical risk would live? Maybe MLRecipes or something similar?
I just added Transformations... did we decide on where something like empirical risk would live? Maybe MLRecipes or something similar?
Isn't ObjectiveFunctions.jl
intended to be a substitution for it? If not (or regardless?) let's make a MLRisks.jl
package which focuses purely on the Empirical- and Sturctural Risk minimization approach.
Isn't ObjectiveFunctions.jl intended to be a substitution for it?
No... I was thinking ObjectiveFunctions and Transformations would be components of a LearningAlgorithm, and that they'd be separate.
let's make a MLRisks.jl package
:+1:
Alright then. This seems like a good first step to act on. I'll take care of Losses.jl
now (and MLRisks.jl
later), why don't you go ahead create the others.
Ok I'll wait a little bit to hear out any objections, then I'll create:
Already created:
@Evizero will move/create:
All... please also let me know which packages you want to be involved in, as well as if you want to be a "team lead" (or co-lead) for that package, so we all know who's organizing efforts.
I'll try to focus on StochasticOptimization
, but I imagine this will involve me mucking around with everything else since it depends on basically everything else.
@ahwillia understood. I think you and @joshday should be co-leads of that (if you're both up for it), and I hope you'll be active in other stuff when it makes sense.
I am sure I will creep around in all said packages a lot.
In terms of Lead. Of core interest to me is Losses.jl
(which is pretty much fully functional and well tested) and MLRisks.jl
(which I have most of the code for and just need to wait until Penalties and Predictors are provided). Once those are in place I will finally direct my gaze back to KSVM :)
Since @joshday mentioned that structural risks are important to him I would hope to co-author MLRisks.jl
with him to suite both our needs perfectly.
Awesome thanks @Evizero. And I hope you have lots of time for us!
As for me, I think I'd like to lead/co-lead ObjectiveFunctions.jl and Transformations.jl, but of course I want to be involved with everything in some capacity.
I'll just pipe in that I'm excited about this work and have been lurking, but actively following, this discussion.
I'm not sure about the LearnLab.jl name, but I like the unifying concept as convenience to loading the ecosystem of the various isolated topics. I don't have any better suggestions yet, though (I've thought of several worse ones).
and older idea from @tbreloff was MLWorkBench.jl
I'd like to throw some in the mix I can think of
keep it going:
What about just Learn.jl or Learning.jl. It has nice symmetry with LearnBase.
+1 to MLWorkbench.jl
though I think we should table that discussion until
everything becomes more mature.
I want to put prox ops for penalty functions somewhere over the next day or
two. Is this going in ObjectiveFunctions.jl
and not Losses.jl
? I was
under the impression that penalties would go in Losses
...
On Thu, Jun 30, 2016 at 11:12 AM, Tom Breloff notifications@github.com wrote:
What about just Learn.jl or Learning.jl. It has nice symmetry with LearnBase.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/8#issuecomment-229742674, or mute the thread https://github.com/notifications/unsubscribe/AAm20Z-tP2CPLb1-M7POadRcLyPKa1lzks5qRAbygaJpZM4I_eeP .
+1 to
MLWorkbench.jl
though I think we should table that discussion until everything becomes more mature.
agreed
I was under the impression that penalties would go in
Losses
.
it makes more sense to put the penalties in the same please where the CachedLeastSquares
substitution also lives.
I think we should table that discussion until everything becomes more mature.
I want to make a placeholder, and literally just add most of JuliaML to the REQUIRE file and re-export the packages. It would be nothing to maintain, but it would be the easy way for new users to get going without looking at each individual package. I think we should do it today, and mostly forget about it after (since we wouldn't need to change much there). I lean towards Learn.jl right now, but if everyone prefers MLWorkBench I can be persuaded.
put the penalties in the same please where the CachedLeastSquares substitution also lives.
I'm still confused about this... should CachedLeastSquares live in MLRisk? If so, I don't think penalties should live there... they are too core.
I'm still confused about this... should CachedLeastSquares live in MLRisk?
no, ObjectiveFunctions
I'm ok with Learn.jl
I'm still confused about this... should CachedLeastSquares live in MLRisk?
no, ObjectiveFunctions
Ok I think I understand now. Yes all penalties would go in ObjectiveFunctions then (which depends on both LearnBase and Losses)
Another reason why I would like penalties not live in Losses.jl is because they are two different topics really that share no code. I lean more towards the unix philosophy
There is a stub repo here: https://github.com/Rory-Finnegan/Learn.jl
@Rory-Finnegan... do you want to be involved with JuliaML? Are you ok with us using the Learn.jl name?
@tbreloff I would like to be involved with JuliaML and feel free to take the Learn.jl name. Let me know if you need me to delete my repo.
@Rory-Finnegan: Awesome! If you haven't already, please read through the discussions and let us know how you'd most like to contribute. I assume you haven't published Learn.jl to METADATA, right? If not, then it doesn't matter what you do with your Learn.jl, though it might be good to either put a big notification which links here, or just remove it if you don't care about what's there. It's totally up to you what you want to do.
This has been discussed repeatedly, but it's important to get right if we want widespread adoption. Some references:
https://github.com/Evizero/MLModels.jl/issues/12 https://github.com/Evizero/MLModels.jl/issues/3 https://github.com/JuliaOpt/Optim.jl/pull/87 https://github.com/JuliaStats/Roadmap.jl/issues/15 https://github.com/JuliaStats/Roadmap.jl/issues/4 https://github.com/JuliaStats/Roadmap.jl/issues/20
(there are more linked in those issues, and I'm sure I missed a bunch of good conversations)
I recommend a quick skim over those discussions before commenting, if you can find the time.
What are we supporting?
It's important to remember all the various things we'd like to support with the core abstractions, so we can evaluate when a concept applies and when it doesn't:
And there are some opposing perspectives within these classes:
All verbs need not be implemented by all transformations, but when there's potential for overlap, we should do our best to generalize.
Take in inputs, produce outputs
The generalization here is that the object knows how to produce
y
iny = f(x)
. This could be the logit function, or a previously fitted linear regression, or a decision tree. Options:predict(taken by StatsBase)map(taken by Base)I continue to be a fan of
transform
, with the caveat that we may wish to have the shorthand such that anything that can transform can be called as a functor.Generate/draw from a generative model
I think using Base.rand here is generally going to be fine, so I don't think we need this as one of our core verbs.
Use data to change the parameters of a model
fittaken by StatsBaseI've started leaning towards
learn
, partially for the symmetry withLearnBase
, but also because it is not so actively used in either stats (fit) or ML (train), and so could be argued it's more general.I think
solve
/optimize
should be reserved for higher-level optimization algorithms, andupdate
could be reserved for lower-level model updating.Types
I personally feel everything should be a
Transformation
, though I can see the argument that aggregations, distributions and others don't belong. Amean
is a function, but really it's aCenterTransformation
that uses a "mean function" to transform data.Can a transformation take zero inputs? If that's the case, then I could argue a generative model might take zero inputs and generate an output, transforming nothing into something.
If we think of "directed graphs of transformations", then I want to be able to connect a Normal distribution into that graph... we just have the flexibility that the Normal distribution can be a "source" in the same way the input data is a "source".
With this analysis,
AbstractTransformation
is the core type, and we should make every attempt to avoid new types until we require them to solve a conflict.Introspection/Traits
There are many things that we could query regarding attributes of our transformations:
I would like to see these things eventually implemented as traits, but in the meantime we'll need methods to ask these questions.
Package Layout
I think we agree that LearnBase will contain the core abstractions... enough that someone can create new models/transformations/solvers without importing lots of concrete implementations of things they don't need.
We need homes for concrete implementations of:
StatsBase and existing abstractions
StatsBase contains a ton of assorted methods, types, and algorithms. StatsBase is too big for it to be a dependency of LearnBase (IMO), and LearnBase is too new to expect that StatsBase would depend on it. So I think we should have a package which depends on both LearnBase and StatsBase, and "links" the abstractions together when it's possible/feasible. In some cases this might be as easy as defining things like:
What are the other packages that we should consider linking with?
cc: @Evizero @ahwillia @joshday @cstjean @andreasnoack @cmcbride @StefanKarpinski @ninjin @simonbyrne @pluskid
(If I forgot to cc someone that you think should be involved, please cc them yourself)