JuliaML / LossFunctions.jl

Julia package of loss functions for machine learning.
https://juliaml.github.io/LossFunctions.jl/stable
Other
147 stars 33 forks source link

Wishlist #4

Closed tbreloff closed 8 years ago

tbreloff commented 8 years ago

What functionality should be accessible? How should we access it? What is core? What exists already in a usable state? (see StatsBase.jl, MLBase.jl, MachineLearning.jl, ...) What should be wrapped/linked? What should be left unimplemented, waiting for 3rd party extension?

This is not a complete list... just a placeholder which we should add to.

Concepts:

Models/algorithms:

Evizero commented 8 years ago

For empirical risk stuff (loss functions and penalizer) I think we should use EmpiricalRisks.jl which is optimized and has a great extensible design. It uses functors to represent the loss functions etc and it is now quite extensive already. For deterministic optimization in that regard Regression.jl which builds on it is also quite nice. SGDOptim.jl on the other hand feels very experimental to me. I think your code is probably more fleshed out in that regard. It would be nice though if it also used EmpiricalRisks.jl (which I also do for SVMs)

tbreloff commented 8 years ago

When designing OnlineStats, we reviewed EmpiricalRisks and Regression and decided it wasn't flexible enough for our needs, so rather than slog through the PR process to redesign them, we built our own. You should really review OnlineStats... Please let me know if you see any deficiencies in the design, and I think we should use that as our starting point.

On Oct 12, 2015, at 4:27 AM, Christof Stocker notifications@github.com wrote:

For empirical risk stuff (loss functions and penalizer) I think we should use EmpiricalRisks.jl which is optimized and has a great extensible design. It uses functors to represent the loss functions etc and it is now quite extensive. For deterministic optimization in that regard Regression.jl which builds on it is also quite nice. SGDOptim.jl on the other hand feels very experimental to me. I think your code is probably more fleshed out in that regard. It would be nice though if it also used EmpiricalRisks.jl (which I also do for SVMs)

— Reply to this email directly or view it on GitHub.

Evizero commented 8 years ago

fair enough! I will review both (OnlineStats and OnlineAI) later today after I am done with work.

Generally though, I think we should try and incorporate the efforts of others even if it is more work and involves tedious PRs. For example: I know that the author of EmpiricalRisks.jl is very very busy and sometimes PRs I create sit there for a week or two without any comment. I too was thinking of just creating my own approach to avoid this kind of frustration (loss functions aren't that much work to implement after all), but in the end what we really want to create is a community and encourage separation of concerns. What we don't want is to end up with "this guy's ML Framework" vs "these guys' ML framework"

tbreloff commented 8 years ago

Theoretically I agree, but in practice it's nearly impossible to get everyone on board with the same abstract type tree, which is why we should do everything we can to minimize labeling of "nouns". When I have some time I'll try to come up with a proof of concept of some potential designs.

On Oct 12, 2015, at 8:30 AM, Christof Stocker notifications@github.com wrote:

fair enough! I will review both later today after I am done with work.

Generally though, I think we should try and incorporate the efforts of others even if it is more work and involves tedious PRs. For example: I know that the author of EmpiricalRisks.jl is very very busy and sometimes PRs I create sit there for a week or two without any comment. I too was thinking of just creating my own approach to avoid this kind of frustration (loss functions aren't that much work to implement), but in the end what we really want to create is a community and encourage separation of concerns. What we don't want is to end up with "this guy's ML Framework" vs "these guys' ML framework"

— Reply to this email directly or view it on GitHub.

Evizero commented 8 years ago

I am currently going through OnlineStats (and there is a lot that I like about it) with a focus on Loss and Penalities. I have to say, concerning Loss/Penalty etc I don't see the appeal in that approach (if I understand it correctly) vs EmpiricalRisks. Why would you not want to separate the concerns out? It makes much more sense to me to have nouns for losses and penalties that define their value and gradient and be able to combine them however I'd like. Why would you at the lowest level bind that to a LogisticRegression rather than having a Empirical Risk Model with Logistic loss, sigmoid prediction function and an L2Penalty. This seems like something a high-level layer should do for convenience.

You said that you went over Empirical Risks and decided it wasn't flexible enough. Do you remember what the problem was? was it the prox verb? Maybe understanding that issue would make me understand this approach better, because currently it seems less powerful (and less clean) to me.

tbreloff commented 8 years ago

To be honest, @joshday redesigned parts of it to try and generalize the various solvers... I need to review those changes in a little more detail before I can comment.

Take a look at OnlineAI, as at least I know what's there. Specifically, check out costs.jl, data.jl, gradient.jl, and maybe activations.jl in https://github.com/tbreloff/OnlineAI.jl/tree/master/src/nnet. These might be a little more in line with the framework that I have in my head (but they still need work). Note that these abstractions are able to represent complex deep neural nets layer by layer in pipeline fashion, so if we design something that can replace these abstractions and still work for ksvm, random forests, and others, then I think we have a good idea that the abstractions are effective.

Evizero commented 8 years ago

I am in the process of looking at the files you just stated. That code does look very nice. I am not sure (literally) if a nnet approach can completely substitute a general empirical risk minimization framework. I mean sure we can write SVM layers, but I still think we need a standalone SVM implementation to cover the full scope of what SVMs can do.

Could it be that we have been talking about different things? I can clearly see that EmpiricalRisks.jl is not suitable for what you do in OnlineAI with neural networks; so there is no objection there. I am talking about more classical problems such as K-SVMs, LogisticReg, LinearReg,with or without bias and with some Penalty function. These kind of problems are really nicely addressed with a modular architecture ala EmpiricalRisks.jl

EDIT: well the more I look at costs.jl the more I spot some overlap with EmpiricalRisks. But I still get why you didn't use it for your NN code

tbreloff commented 8 years ago

I agree with you... I was not proposing that we should try to fit SVMs into a neural net abstraction. That's just silly :)

I was saying that there are some pieces of heavy overlap between all these methods... some have same/similar loss models, some have same/similar gradient calculations, solving algorithms, etc. Our goal is to figure out all those things that overlap between these various algorithms and create smart abstractions. Everything else should stay specific to the given class of problem. This means that we may have repeated code in some places, which is ok if we do that instead of creating abstractions that really don't abstract well in all cases.

tldr: I'd rather have no abstractions than bad/incomplete abstractions. This way each problem can have very specific implementations/features, and we only abstract the most obvious things.

Evizero commented 8 years ago

As a really off-topic sidenote: I think it's cool that you are interested in implementing liquid state machines! Because I studied at the university where he has his institute, I had (and still have) the pleasure of attending Prof. Maass' lectures (which are very educating) and I think the whole area of brain inspired computing is very intriguing

joshday commented 8 years ago

For SGModel, the only things combined are link/loss, which are defined by the model (i.e. LogisticRegression) to remove the burden from the user. Penalties are separate from the model (and also get reused for SparseReg). Making a more general model type with user-defined link/loss is on my todo list.

Evizero commented 8 years ago

Do you think it would make sense to you to use EmpiricalRisks.jl for the corresponding model formulations? Or is there something about it that just doesn't fit into your design?

EDIT: I am not suggesting that you should refactor your code. I am just trying to understand the different requirements and opinions on things that are related to what I am working on

tbreloff commented 8 years ago

@Evizero Cool! I think LSMs are fascinating, and I spent several months doing intense research on them along with related topics like neuronal dynamics, spiking equations synaptic plasticity, etc. You should ask Prof Maass if he has any interest in learning julia :)

@joshday It is my goal that the framework we design here will fully support OnlineStats abstractions, so it would be great if we pool some effort here instead of redesigning just for OnlineStats. I'm obviously closely linked to both so I'll help keep them as consistent as possible.

joshday commented 8 years ago

I think not using EmpiricalRisks was mostly just not knowing how performant it was.

This is minor, but I do remember this: EmpiricalRisks.grad(loss, u, y) where u is the prediction. In statistics, a residual is y - u, which is typically what shows up in the gradient. The arguments are backwards to me.

tbreloff commented 8 years ago

I can't remember specific reasons we didn't like EmpiricalRisks. But I think the decision was something like "there's not much code here, and it's not the way we would do it, so lets just have our own".

I think we should attempt to identify (in a separate issue) what the ideal version of EmpiricalRisks would look like for our purposes and see how closely it fits in its current form. If we can make small adjustments via PRs, then we can try that. Otherwise it's easier to start from scratch. My big concern is that he's just too busy to give our PRs any attention. If EmpiricalRisks was part of an organization, it would be a different story all together...

Evizero commented 8 years ago

Maybe we could reopen the discussion of EmpicicalRisks.jl's utility for defining the loss functions, penalizer, and risk models for our planned ecosystem? I have been working with that package for a while now and I am pretty happy with it. The performance is also great. The only thing that is a bit of a problem with working on EmpiricalRisks.jl are the tight time constraints of its author.

From the ML framework perspective it would be awesome if we put the stochastic learning of EmpiricalRisk models into your package where it makes sense. Regression.jl deals with full batch learning of empirical risks models pretty well, but the stochastic side could need some love.

EDIT: as a sidenote it is EmpiricalRisks.deriv(loss, p, y) now

tbreloff commented 8 years ago

Yes the stochastic side is not ideal. I can believe that it's perfectly fine for certain batch algorithms. However the "tight time constraints" are something that could be a very important deciding factor for me. I'm willing to give it a chance, but I can't promise I'll like it.

Evizero commented 8 years ago

Maybe to give this a little more context. I use the loss, penalizer etc .. functors of EmpiricalRisks.jl to define the SVM's structure in my KSVM.jl package. Using this it is pretty easy to build SVM specifications in a lego-type manner. Also all the solvers of Regression.jl (like BFGS) for solving all kinds of linear SVMs in the primal just work for free. If now a package comes along that implements stochastic gradient descent etc for these formulations it would be available for my SVM package as well. Also it enables the possibility for specialized algorithms for specific combinations of loss and penalizer. For example I implement a dual coordinate descent algorithm but that only works for HingeLoss or SqrHingeLoss and L2 penalization.

Evizero commented 8 years ago

However the "tight time constraints" are something that could be a very important deciding factor for me. I'm willing to give it a chance, but I can't promise I'll like it.

Well I think the willingness to consider it is enough for now. Looking at your initial list, I think we have quite a few other conceptual things to solve before we should actually reach a final decision on it. So I don't think we should refactor any code right now,.

Evizero commented 8 years ago

outsourced