LowRankModels duplicate functionality

mihirparadkar commented 7 years ago

LowRankModels implements a Loss type that is eerily similar to this API. It implements value and derivative as well as M-estimators for several losses implemented here (L2, L1, Huber, Logistic, Hinge), as well as multivariate losses (MultinomialLoss, OvALoss, BvSLoss, OrdinalHingeLoss, etc...)

I don't believe that loss function implementations truly belong in a package for dimensionality reduction, but in a loss functions package. It'd be a good idea to combine the best parts of both into this package.

joshday commented 7 years ago

I like the way you're thinking. It would be great to consolidate efforts into loss functions here. Are you willing to contribute some of the things LossFunctions is missing?

@madeleineudell, are you okay with LossFunctions "stealing" some of your losses? I don't want to start taking things without your permission.

It looks like LowRankModels losses are here: https://github.com/madeleineudell/LowRankModels.jl/blob/master/src/losses.jl

madeleineudell commented 7 years ago

I would be delighted for you to steal my loss functions; I think it makes much more sense for them to live here. Especially if you're able to improve speed while preserving numerical stability.

On Jun 24, 2017 8:56 AM, "Josh Day" notifications@github.com wrote:

I like the way you're thinking. It would be great to consolidate efforts into loss functions here. Are you willing to contribute some of the things LossFunctions is missing?

@madeleineudell https://github.com/madeleineudell, are you okay with LossFunctions "stealing" some of your losses? I don't want to start taking things without your permission.

It looks like LowRankModels losses are here: https://github.com/madeleineudell/LowRankModels. jl/blob/master/src/losses.jl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/LossFunctions.jl/issues/86#issuecomment-310847005, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyp9P6_8ioZsLRpp2App6Ipt6TM0rHfks5sHTGggaJpZM4OEN_W .

mihirparadkar commented 7 years ago

Yes, I'll happily port over some LowRankModels loss functions here.

pkofod commented 7 years ago

are you okay with LossFunctions "stealing" some of your losses?

I wish more people would steal my losses...

anyway, +1 from me (merely a JuliaML follower). This seems to be very much in line with the philosophy behind many of the packages in this organization.

Evizero commented 7 years ago

Hi @mihirparadkar, thanks a lot for bringing this to everyone's attention and for starting this conversation. I wasn't at all aware of @madeleineudell's package and am very happy about the opportunity for collaboration and package extension. It is also great to see how everyone here is embracing this idea.

I will look into this and the related issues you have opened, and try to provide feedback on all your points / questions as soon as I find some spare time to devote to this (~ next couple of days)

Evizero commented 7 years ago

LowRankModels looks like a really impressive body of work. I see a lot of overlap in some places and a lot of diverging ideas in others. I admit I have never considered having a domain be a choosable part of a loss, but the way its implemented makes a lot of sense.

I think in order to make this loss-napping undertaking successful we will have to better understand each others ideas and design decisions in order to gravitate to the "best mix" (if that expression makes sense). I certainly think that we can adjust the design of LossFunctions.jl if there are good reasons to.

Let me start by explaining my personal reasoning behind some parts of this package. Note that I am not the only contributor to this package, so this is just one opinion and doesn't necessarily reflect the thoughts of the other authors.

Concerning domains. As it is now this package treats loss functions as very "stupid" low level blocks that don't know or care very much what the concept "prediction" means. All it really is, is a function f(yhat, y) that is defined element-wise. This means it has no idea how yhat even came to be. What it does with those numerical values is predetermined by the loss (family). Most losses are members of one of two subfamilies called DistanceLoss (which you call DiffLoss) and MarginLoss (which is similar but not completely the same as what you call ClassificationLoss).

The reasoning behind not linking a loss with any kind of "prediction concept" was that I never really saw it as part of the prediction process, but instead just as part of a training procedure. So in my mind I it seemed sensible that a pre-trained model in a production environment does not need to know what loss was involved in producing the prediction function just as it does not need to know which kind of gradient-based learning algorithm produced it. So in my simplified view the learning environment is just a useful tool to avoid the need to hand-program a function of if-statements that should output which bird a picture displays. The product of interest is the prediction function - however it may have been created - and everything else should be a discard-able by-product.

So given these two thoughts it made sense to have the "domain" interpretation somewhere else. For classification problems I considered to treat the term "class" as an abstract concept that only really a human cares about (e.g. "this observation represents a malignant tumor"), while a "label" is some useful and consistent representation of such a concept (e.g. malignant is represented by "1" and benign by "0"). This is the whole premise behind https://github.com/JuliaML/MLLabelUtils.jl . So for example if I have a dataset where I know which observations are malignant tumors and which are benign, its just a matter of representing that information in a way the algorithms are able to make use of them. e.g. {-1,1} for linear support vector machines where a hinge loss or a quadratic hinge loss is involved, or {0,1} for typical textbook logistic regression.

Seen under a different light, what that all boils down to is that I shifted the responsibility of choosing and producing a sensible encoding for a given loss to the user. So in a way its a cop-out, but it also makes the whole ordeal really modular. That said, the intended "users" are package developers that are interested in producing high-level interfaces.

Evizero commented 7 years ago

To sum this up in brief words

I am open to incorporating a "Domain" of some sort into the loss if there is a convincing reason to. I don't see such a reason for binary classification or regression, but I have not yet really considered ordinal regression or multiclass or multilabel classification.

Either way I'd appreciate any links that can help me better understand the importance of having the "domain". I am very open to the idea that maybe I am simply missing some knowledge there

madeleineudell commented 7 years ago

@Evizero , thanks for explaining the reasoning behind your abstractions. You're right that LowRankModels starts with a slightly different view of the problem.

In particular, we don't want to require that the user has encoded the data in some special way. In our view, the user should be able to input any data frame, and have fit! and impute! methods just work. That is, impute! should automatically predict the right kinds of values, which means, they should have the same kinds of values that the corresponding column of the data frame originally had. We don't yet support domains that consist of, say, sets of (a few) strings, but we wish we did.

We view the loss function as mapping between the values in the data frame, and the (real or real-vector) values in model space. We minimize the loss over the values in model space to fit the model; we minimize the loss over the values in data space to find the imputations or predictions. This is a nicely symmetric view of the world, and ensures predictions always match the data that was used to fit the model. This view is explained more thoroughly in Section 5.3, page 41, of the paper on Generalized Low Rank Models.

It's possible that this view forms an abstraction one layer higher than the view in LossFunctions, in if you want to assume that the user has performed a reasonable encoding. If so, LowRankModels might want to wrap the loss functions to ensure the predictions are in the right space. On the other hand, if you like this abstraction, there's no reason it couldn't go all the way down.

Evizero commented 7 years ago

In particular, we don't want to require that the user has encoded the data in some special way. In our view, the user should be able to input any data frame, and have fit! and impute! methods just work

I fully agree with you there and share that opinion. The idea so far was that this kind of magic happens at a higher level. I am not opposed to introducing this kind of logic to LossFunctions now that its stable, but it was very important to me to not start the package out this way.

It's possible that this view forms an abstraction one layer higher than the view in LossFunctions, in if you want to assume that the user has performed a reasonable encoding.

Up to now this was basically my intention behind the package's design. A main premise behind having LossFunctions that "stupid" was that Julia allows to completely skip low level-language kernels, which to me suggested the possibility of having multiple abstractions layers instead of monolithic "do everything" packages. I say "up to now" because I have not yet read the document you linked.

This is a nicely symmetric view of the world, and ensures predictions always match the data that was used to fit the model. This view is explained more thoroughly in Section 5.3, page 41, of the paper on Generalized Low Rank Models.

On the other hand, if you like this abstraction, there's no reason it couldn't go all the way down.

Thank you for the link, I will certainly look into that in order to attain a more informed opinion.

@mihirparadkar I am sorry that this is dragging out a little longer than expected. Note that you starting this conversation is very appreciated, and I hope your interest in this endeavour will last a little longer.

mihirparadkar commented 7 years ago

@Evizero I didn't expect this to happen overnight, and I'm glad that it's being discussed thoroughly. I firmly believe that good-quality software is a result of collaboration so I'm very happy that I could bring the LowRankModels teamand JuliaML team together.

I do have some thoughts of my own regarding these points.

In particular, we don't want to require that the user has encoded the data in some special way. In our view, the user should be able to input any data frame, and have fit! and impute! methods just work

I fully agree with you there and share that opinion. The idea so far was that this kind of magic happens at a higher level. I am not opposed to introducing this kind of logic to LossFunctions now that its stable, but it was very important to me to not start the package out this way.

I agree that one goal is to have a user be able to stick in some data frame and have fit! and impute! (or predict) just work. However, I like the idea of encoding user data using a small set of standard encodings (like in MLLabelUtils) before LossFunctions see them.

It's impossible in the general case to tell what domain a user wants just from the data passed in. For example, does (:low, :medium, :high) mean an ordinal encoding or a categorical encoding? There are infinitely many possible domains that encapsulate the same information (i.e. integers from 2-6, 1-5, and ["strongly disagree", "disagree", "neither", "agree", "strongly agree"]). Using a single encoding at the loss function level makes it easier to write performant and maintainable code, since loss functions can expect a consistent input format. I think the mapping between user input and this standard input should be handled in a different package to maintain modularity and simplicity.

This is a nicely symmetric view of the world, and ensures predictions always match the data that was used to fit the model. This view is explained more thoroughly in Section 5.3, page 41, of the paper on Generalized Low Rank Models.

I think this is the strongest reason why domains are tied to losses, but I can't think of any useful cases where the prediction of a real value differs between loss functions, given the same value and encoding (i.e. all MarginLosses will predict 1 given a positive output and {-1,1} encoding). The only cases where this isn't true is where the wrong kind of loss was used given the encoding (i.e. real-valued encoding with a MarginLoss). I happen to think this is a case of "garbage in, garbage out". Besides these pathological cases, it doesn't seem like the loss function is needed in predicting the class of the output.

mihirparadkar commented 7 years ago

To summarize:

I'm inclined to agree with @Evizero about maintaining modularity and separations between loss functions and label utils. More specific package code can handle the integration at a higher level.

I like the idea of symmetry between fitting and predicting, but I don't think there are any useful cases where the default intuitive behavior is different from the mathematical definition.

juliohm commented 4 years ago

I am delighted to see this kind of discussion. Really nice thoughts. I am also inclined with the more self-contained view of losses are just functions without involvement with label encodings, and context-specific uses of losses. I also agree that by separating these into two different layers of abstraction we have more power to specify multiple behaviors as opposed to relying on a canonical behavior that just works with an input dataframe.

We are aware of the nice packages, and will try to conciliate the views and functionality in some way moving forward. I am closing the issue for now, but feel free to reopen if you feel that something hasn't being discussed in depth yet.

JuliaML / LossFunctions.jl

LowRankModels duplicate functionality #86