Implement a loss function for classification models

DilumAluthge commented 3 years ago

We currently have an example of a loss function for regression models. Specifically, we implement the root mean squared error.

However, we don't currently have an example of a loss function for classification models.

We need to:

Decide which loss function we want to implement.
Implement it.
Add it to the multinomial logistic regression example.

DilumAluthge commented 3 years ago

Here is our implementation of the RMSE:

https://github.com/cscherrer/SossMLJ.jl/blob/f0b38b0a4d14810a538cd275d64c5046c1c25ee7/src/loss-functions.jl#L1-L19

DilumAluthge commented 3 years ago

@cscherrer Any thoughts on a good loss function for the multinomial classification problem? Some options include:

Brier score
Cross entropy loss

Any other options?

cscherrer commented 3 years ago

Either of those would be good, or an asymmetric loss could be interesting. I'd think this must come up a lot in medical applications, right?

DilumAluthge commented 3 years ago

Yeah in binary classification problems (e.g. mortality prediction), we often want to use a loss function that e.g. penalizes underprediction more than overprediction.

I think for the multinomial example, we can just use something simple and symmetric. Then later we can add a binary class problem with class imbalance, and then we can think about some kind of asymmetric loss function for that problem.

DilumAluthge commented 3 years ago

Let's go with the Brier score. For consistency with MLJ, we should implement it the same way they do (https://github.com/alan-turing-institute/MLJBase.jl/blob/5e5d1cda3b555510df1de4b125a5e320c11f6256/src/measures/finite.jl#L103-L131):

""" BrierScore(; distribution=UnivariateFinite)(ŷ, y [, w]) Given an abstract vector of distributions ŷ of type distribution, and an abstract vector of true observations y, return the corresponding Brier (aka quadratic) scores. Weight the scores using w if provided. Currently only distribution=UnivariateFinite is supported, which is applicable to superivised models with Finite target scitype. In this case, if p(y) is the predicted probability for a single observation y, and C all possible classes, then the corresponding Brier score for that observation is given by 2p(y) - \\left(\\sum_{η ∈ C} p(η)^2\\right) - 1 Note that BrierScore()=BrierScore{UnivariateFinite} has the alias brier_score. Warning. Here BrierScore is a "score" in the sense that bigger is better (with 0 optimal, and all other values negative). In Brier's original 1950 paper, and many other places, it has the opposite sign, despite the name. Moreover, the present implementation does not treat the binary case as special, so that the score may differ, in that case, by a factor of two from usage elsewhere. For more information, run info(BrierScore). """

DilumAluthge commented 3 years ago

I think this is blocked by https://github.com/cscherrer/SossMLJ.jl/issues/93

Once https://github.com/cscherrer/SossMLJ.jl/issues/93 is solved, I can just get the prediction for μ in the form of particles. Once I have the particles for μ, I can just put that directly into the formula for the Brier score.

cscherrer commented 3 years ago

Sounds good

cscherrer / SossMLJ.jl

Implement a loss function for classification models #117