List of metrics in R and elsewhere, and which ones are implemented in MLJ

tlienart commented 4 years ago

https://mlr.mlr-org.com/articles/tutorial/measures.html

binary classification

MLR	MLJBase	Comment
acc	`accuracy`	✔️
auc	`auc`	✔️
bac	`bac, bacc, balanced_accuracy`	✔️ ; we use the sklearn definition which is also valid for multiclass
ber	missing
brier	`BrierScore`	maybe worth adding a shortcut?
brier scaled	missing?	maybe worth checking?
f1	`f1, f1score`	✔️
fdr	`fdr, falsediscovery_rate`	✔️
fn	`fn, falsenegative`	✔️
fnr	`fnr, falsenegative_rate, miss_rate`	✔️
fp	`fp, falsepositive`	✔️
fpr	`fpr, falsepositive_rate, fallout`	✔️
gmean	missing
gpr	missing
kappa	missing
logloss	cross entropy ?	check
lsr	missing
mcc	`mcc, mathews_correlation`	✔️
mmce	missing
multiclass au1p	missing	👀
multiclass au1u	missing
multiclass aunp	missing
mutliclass aunu	missing
multiclass brier	missing
npv	`npv`	✔️
ppv	`ppv, precision`	✔️
qsr	missing
ssr	missing
tn	`truenegative, tn`	✔️
tnr	`truenegative_rate, tnr, specificity, selectivity`	✔️
tp	`truepositive, tp`	✔️
tpr	`truepositive_rate, tpr, recall, sensitivity, hit_rate`	✔️
wkappa	missing

multiclass classification

MLR	MLJBase	Comment
acc	accuracy
f1	✔️
hamloss	missing
ppv	✔️
subset01	missing
tpr	✔️

regression

some of these may be available in LossFunctions (?) + I did this one on the top of my head so may be worth double checking

MLR	MLJBase	Comment
arsq	missing
expvar	missing
kendalltau	missing
mae	mav	check
mape	missing
medae	missing
medse	missing
mse	mse
msle	missing
rae	missing
rmse	rms
rmsle	rmsl
rrse	missing
rsq	missing
sae	missing
spearmanrho	missing
sse	missing

survival analysis

we don't have that yet

cluster analysis

we don't have any of those but probably should

Cost-senstive classification

we don't have that

General performance model

(?)

tlienart commented 4 years ago

Might be good to also add stuff like rsquared, SSE, TSS etc for regression metrics

azev77 commented 4 years ago

@tlienart I submitted a PR to add MAPE.

Copying #294: Rob's paper: https://robjhyndman.com/papers/mase.pdf (might be a newer version) has a nice review of different measures of forecast accuracy.

Would be nice to incorporate at some point. (They focus on univariate time series, but their results apply more broadly.)

2.1 Scale-dependent measures: e := ŷ - y

[ ] Mean Square Error (MSE)
[x] Root Mean Square Error (RMSE)
[x] Mean Absolute Error (MAE)
[ ] Median Absolute Error (MdAE)

2.2 Measures based on percentage errors: *p := 100e/y**

[x] Mean Absolute Percentage Error (MAPE)
[ ] Median Absolute Percentage Error (MdAPE)
[x] Root Mean Square Percentage Error (RMSPE)
[ ] Root Median Square Percentage Error (RMdSPE)
[ ] Symmetric Mean Absolute Percentage Error (sMAPE)
[ ] Symmetric Median Absolute Percentage Error (sMdAPE)

2.3 Measures based on relative errors (Rel to benchmark) **r := e/e***

[ ] Mean Relative Absolute Error (MRAE)
[ ] Median Relative Absolute Error (MdRAE)
[ ] Geometric Mean Relative Absolute Error (GMRAE)

2.4 Relative measures. {is Rsquared in this category?}

[ ] RelMAE=MAE/MAEb

3 Scaled errors

[ ] Mean Absolute Scaled Error (MASE)

Check out his TS book too: https://otexts.com/fpp2/accuracy.html

azev77 commented 4 years ago

& this unmaintained repo has plenty of Julia coded measures: https://github.com/JuliaML/MLMetrics.jl Along w/ Flux: https://github.com/FluxML/Flux.jl/blob/0287abbf663b1ae276360ea4a396adad1d2d1df7/src/layers/stateless.jl

ablaom commented 4 years ago

I've looked at the flux measures and I think their API is a little different because they want the measures to work on batches. And they only have a few. And I've had problems getting them to work as expected.

azev77 commented 4 years ago

Copying this here before I forget. Also: this could be a nice tutorial, bc ppl in industry, (in my field at least), often need custom scores... This would be a good tutorial.

@tlienart rsquared is an example of a relative measure (relative to the benchmark model = mean).

function err(ŷ, y)
    return ŷ - y
end

function perr(ŷ, y; tol = eps())
    e = err(ŷ, y)
    p = 100 * e[y .!= 0] ./ y[y .!= 0]
    return p
end

using Statistics
mse(ŷ, y)  = err(ŷ, y) |> (x)->x.^2 |> mean
rmse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> mean |> sqrt
mae(ŷ, y)  = err(ŷ, y) |> x -> abs.(x) |> mean
mdae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> median
rmdse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> median |> sqrt
maxae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> maximum

# tutorial     Root   mean    square     error
rmse2(ŷ, y) = (sqrt ∘ mean ∘ (x->x.^2) ∘ err)(ŷ, y)
# tutorial     mean    absolute      error
mae2(ŷ, y)  = (mean ∘ (x->abs.(x)) ∘ err)(ŷ, y)

rmsl(ŷ, y) = err(log.(ŷ), log.(y)) |> x->x.^2 |> mean |> sqrt
rmslp1(ŷ, y) = err(log.(ŷ.+1), log.(y.+1)) |> x->x.^2 |> mean |> sqrt

rss(ŷ, y)  = err(ŷ, y) |> (x)->x.^2 |> sum  # AKA SSE
tss(ŷ, y)  = err(zero(y) .+ mean(y), y) |> (x)->x.^2 |> sum 
ess(ŷ, y)  = err(ŷ, zero(y) .+ mean(y)) |> (x)->x.^2 |> sum 

tss2(ŷ, y)  = mean(y) .- y |> (x)->x.^2 |> sum 
ess2(ŷ, y)  = mean(y) .- ŷ |> (x)->x.^2 |> sum 
rss2(ŷ, y)  = y .- ŷ       |> (x)->x.^2 |> sum 
rsq2(ŷ, y)  = ess2(ŷ, y) / tss2(ŷ, y)

# linear model verify 

# Mean Percentage Error (MPE) is a measure of bias
mpe(ŷ, y)    = perr(ŷ, y) |> mean

mape(ŷ, y)    = perr(ŷ, y) |> x -> abs.(x) |> mean
mdape(ŷ, y)   = perr(ŷ, y) |> x -> abs.(x) |> median
rmspe(ŷ, y)   = perr(ŷ, y) |> x -> x.^2 |> mean |> sqrt
rmdspe(ŷ, y)  = perr(ŷ, y) |> x -> x.^2 |> median |> sqrt

"https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error"
smape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> mean
smdape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> median

# TEST
y    = [0, 1, 2, 3, 4]
ŷ    = [1, 2, 2, 4, 5]

e = err(ŷ, y)
p = perr(ŷ, y)

mse(ŷ, y)
rmse(ŷ, y)
mae(ŷ, y) 
mdae(ŷ, y) 
rmdse(ŷ, y)
maxae(ŷ, y)

mape(ŷ, y)
mdape(ŷ, y)
rmspe(ŷ, y)
rmdspe(ŷ, y)
smape(ŷ, y)
smdape(ŷ, y) 

# mean arctangent absolute percentage error (MAAPE)
maape(ŷ, y)    = perr(ŷ, y) |> x->abs.(x) |> x->atan.(x) |> mean

MAAPE is from this 2016 paper. Great score for intermittent sales data...

ablaom commented 4 years ago

Another to add to the list (looking ahead to image classfication) : Tversky Loss https://github.com/alan-turing-institute/MLJ.jl/issues/554

azev77 commented 4 years ago

Many are also here: https://github.com/LAMPSPUC/ForecastAccuracy.jl

azev77 commented 3 years ago

Hey, did anything ever come of this? Was this implemented in a different repo?

ablaom commented 3 years ago

No, I think the most recent addition of metrics were the multiclass version of FScore and cousins. You can do measures() to list what is available.

azev77 commented 3 years ago

Came across these as well: https://github.com/beacon-biosignals/Lighthouse.jl

ablaom commented 3 years ago

Yep. We got this somewhere.

@azev77 I'm currently performing a JuliaAI-wide issue review and collecting all issues in a GH project. All measure-related issues are being given a "measure" label (as this one has) and one can see all these at one glance. I'll make let you know when it's done. Most of these are marked "straightforward" but "low priority" - but there are a lot of them. I'm trying to see if we can get a student to help out - some of the work is not that hard but it's tedious and not very popular. If you know of someone who we could trust with this work and have a way to pay them...

JuliaAI / MLJBase.jl