JuliaAI / MLJBase.jl

Core functionality for the MLJ machine learning framework
MIT License
160 stars 45 forks source link

List of metrics in R and elsewhere, and which ones are implemented in MLJ #95

Open tlienart opened 4 years ago

tlienart commented 4 years ago

https://mlr.mlr-org.com/articles/tutorial/measures.html

binary classification

MLR MLJBase Comment
acc accuracy ✔️
auc auc ✔️
bac bac, bacc, balanced_accuracy ✔️ ; we use the sklearn definition which is also valid for multiclass
ber missing
brier BrierScore maybe worth adding a shortcut?
brier scaled missing? maybe worth checking?
f1 f1, f1score ✔️
fdr fdr, falsediscovery_rate ✔️
fn fn, falsenegative ✔️
fnr fnr, falsenegative_rate, miss_rate ✔️
fp fp, falsepositive ✔️
fpr fpr, falsepositive_rate, fallout ✔️
gmean missing
gpr missing
kappa missing
logloss cross entropy ? check
lsr missing
mcc mcc, mathews_correlation ✔️
mmce missing
multiclass au1p missing 👀
multiclass au1u missing
multiclass aunp missing
mutliclass aunu missing
multiclass brier missing
npv npv ✔️
ppv ppv, precision ✔️
qsr missing
ssr missing
tn truenegative, tn ✔️
tnr truenegative_rate, tnr, specificity, selectivity ✔️
tp truepositive, tp ✔️
tpr truepositive_rate, tpr, recall, sensitivity, hit_rate ✔️
wkappa missing

multiclass classification

MLR MLJBase Comment
acc accuracy
f1 ✔️
hamloss missing
ppv ✔️
subset01 missing
tpr ✔️

regression

some of these may be available in LossFunctions (?) + I did this one on the top of my head so may be worth double checking

MLR MLJBase Comment
arsq missing
expvar missing
kendalltau missing
mae mav check
mape missing
medae missing
medse missing
mse mse
msle missing
rae missing
rmse rms
rmsle rmsl
rrse missing
rsq missing
sae missing
spearmanrho missing
sse missing

survival analysis

we don't have that yet

cluster analysis

we don't have any of those but probably should

Cost-senstive classification

we don't have that

General performance model

(?)

tlienart commented 4 years ago

Might be good to also add stuff like rsquared, SSE, TSS etc for regression metrics

azev77 commented 4 years ago

@tlienart I submitted a PR to add MAPE.

Copying #294: Rob's paper: https://robjhyndman.com/papers/mase.pdf (might be a newer version) has a nice review of different measures of forecast accuracy.

Would be nice to incorporate at some point. (They focus on univariate time series, but their results apply more broadly.)

2.1 Scale-dependent measures: e := ŷ - y

2.2 Measures based on percentage errors: *p := 100e/y**

2.3 Measures based on relative errors (Rel to benchmark) **r := e/e***

2.4 Relative measures. {is Rsquared in this category?}

3 Scaled errors

Check out his TS book too: https://otexts.com/fpp2/accuracy.html

azev77 commented 4 years ago

& this unmaintained repo has plenty of Julia coded measures: https://github.com/JuliaML/MLMetrics.jl Along w/ Flux: https://github.com/FluxML/Flux.jl/blob/0287abbf663b1ae276360ea4a396adad1d2d1df7/src/layers/stateless.jl

ablaom commented 4 years ago

I've looked at the flux measures and I think their API is a little different because they want the measures to work on batches. And they only have a few. And I've had problems getting them to work as expected.

azev77 commented 4 years ago

Copying this here before I forget. Also: this could be a nice tutorial, bc ppl in industry, (in my field at least), often need custom scores... This would be a good tutorial.

@tlienart rsquared is an example of a relative measure (relative to the benchmark model = mean).

function err(ŷ, y)
    return ŷ - y
end

function perr(ŷ, y; tol = eps())
    e = err(ŷ, y)
    p = 100 * e[y .!= 0] ./ y[y .!= 0]
    return p
end

using Statistics
mse(ŷ, y)  = err(ŷ, y) |> (x)->x.^2 |> mean
rmse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> mean |> sqrt
mae(ŷ, y)  = err(ŷ, y) |> x -> abs.(x) |> mean
mdae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> median
rmdse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> median |> sqrt
maxae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> maximum

# tutorial     Root   mean    square     error
rmse2(ŷ, y) = (sqrt ∘ mean ∘ (x->x.^2) ∘ err)(ŷ, y)
# tutorial     mean    absolute      error
mae2(ŷ, y)  = (mean ∘ (x->abs.(x)) ∘ err)(ŷ, y)

rmsl(ŷ, y) = err(log.(ŷ), log.(y)) |> x->x.^2 |> mean |> sqrt
rmslp1(ŷ, y) = err(log.(ŷ.+1), log.(y.+1)) |> x->x.^2 |> mean |> sqrt

rss(ŷ, y)  = err(ŷ, y) |> (x)->x.^2 |> sum  # AKA SSE
tss(ŷ, y)  = err(zero(y) .+ mean(y), y) |> (x)->x.^2 |> sum 
ess(ŷ, y)  = err(ŷ, zero(y) .+ mean(y)) |> (x)->x.^2 |> sum 

tss2(ŷ, y)  = mean(y) .- y |> (x)->x.^2 |> sum 
ess2(ŷ, y)  = mean(y) .- ŷ |> (x)->x.^2 |> sum 
rss2(ŷ, y)  = y .- ŷ       |> (x)->x.^2 |> sum 
rsq2(ŷ, y)  = ess2(ŷ, y) / tss2(ŷ, y)

# linear model verify 

# Mean Percentage Error (MPE) is a measure of bias
mpe(ŷ, y)    = perr(ŷ, y) |> mean

mape(ŷ, y)    = perr(ŷ, y) |> x -> abs.(x) |> mean
mdape(ŷ, y)   = perr(ŷ, y) |> x -> abs.(x) |> median
rmspe(ŷ, y)   = perr(ŷ, y) |> x -> x.^2 |> mean |> sqrt
rmdspe(ŷ, y)  = perr(ŷ, y) |> x -> x.^2 |> median |> sqrt

"https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error"
smape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> mean
smdape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> median

# TEST
y    = [0, 1, 2, 3, 4]
ŷ    = [1, 2, 2, 4, 5]

e = err(ŷ, y)
p = perr(ŷ, y)

mse(ŷ, y)
rmse(ŷ, y)
mae(ŷ, y) 
mdae(ŷ, y) 
rmdse(ŷ, y)
maxae(ŷ, y)

mape(ŷ, y)
mdape(ŷ, y)
rmspe(ŷ, y)
rmdspe(ŷ, y)
smape(ŷ, y)
smdape(ŷ, y) 

# mean arctangent absolute percentage error (MAAPE)
maape(ŷ, y)    = perr(ŷ, y) |> x->abs.(x) |> x->atan.(x) |> mean

MAAPE is from this 2016 paper. Great score for intermittent sales data...

ablaom commented 4 years ago

Another to add to the list (looking ahead to image classfication) : Tversky Loss https://github.com/alan-turing-institute/MLJ.jl/issues/554

azev77 commented 4 years ago

Many are also here: https://github.com/LAMPSPUC/ForecastAccuracy.jl

azev77 commented 3 years ago

Hey, did anything ever come of this? Was this implemented in a different repo?

ablaom commented 3 years ago

No, I think the most recent addition of metrics were the multiclass version of FScore and cousins. You can do measures() to list what is available.

azev77 commented 3 years ago

Came across these as well: https://github.com/beacon-biosignals/Lighthouse.jl

ablaom commented 3 years ago

Yep. We got this somewhere.

@azev77 I'm currently performing a JuliaAI-wide issue review and collecting all issues in a GH project. All measure-related issues are being given a "measure" label (as this one has) and one can see all these at one glance. I'll make let you know when it's done. Most of these are marked "straightforward" but "low priority" - but there are a lot of them. I'm trying to see if we can get a student to help out - some of the work is not that hard but it's tedious and not very popular. If you know of someone who we could trust with this work and have a way to pay them...