Open tlienart opened 4 years ago
Might be good to also add stuff like rsquared, SSE, TSS etc for regression metrics
@tlienart I submitted a PR to add MAPE.
Copying #294: Rob's paper: https://robjhyndman.com/papers/mase.pdf (might be a newer version) has a nice review of different measures of forecast accuracy.
Would be nice to incorporate at some point. (They focus on univariate time series, but their results apply more broadly.)
2.1 Scale-dependent measures: e := ŷ - y
2.2 Measures based on percentage errors: *p := 100e/y**
2.3 Measures based on relative errors (Rel to benchmark) **r := e/e***
2.4 Relative measures. {is Rsquared in this category?}
3 Scaled errors
Check out his TS book too: https://otexts.com/fpp2/accuracy.html
& this unmaintained repo has plenty of Julia coded measures: https://github.com/JuliaML/MLMetrics.jl Along w/ Flux: https://github.com/FluxML/Flux.jl/blob/0287abbf663b1ae276360ea4a396adad1d2d1df7/src/layers/stateless.jl
I've looked at the flux measures and I think their API is a little different because they want the measures to work on batches. And they only have a few. And I've had problems getting them to work as expected.
Copying this here before I forget. Also: this could be a nice tutorial, bc ppl in industry, (in my field at least), often need custom scores... This would be a good tutorial.
@tlienart rsquared is an example of a relative measure (relative to the benchmark model = mean).
function err(ŷ, y)
return ŷ - y
end
function perr(ŷ, y; tol = eps())
e = err(ŷ, y)
p = 100 * e[y .!= 0] ./ y[y .!= 0]
return p
end
using Statistics
mse(ŷ, y) = err(ŷ, y) |> (x)->x.^2 |> mean
rmse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> mean |> sqrt
mae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> mean
mdae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> median
rmdse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> median |> sqrt
maxae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> maximum
# tutorial Root mean square error
rmse2(ŷ, y) = (sqrt ∘ mean ∘ (x->x.^2) ∘ err)(ŷ, y)
# tutorial mean absolute error
mae2(ŷ, y) = (mean ∘ (x->abs.(x)) ∘ err)(ŷ, y)
rmsl(ŷ, y) = err(log.(ŷ), log.(y)) |> x->x.^2 |> mean |> sqrt
rmslp1(ŷ, y) = err(log.(ŷ.+1), log.(y.+1)) |> x->x.^2 |> mean |> sqrt
rss(ŷ, y) = err(ŷ, y) |> (x)->x.^2 |> sum # AKA SSE
tss(ŷ, y) = err(zero(y) .+ mean(y), y) |> (x)->x.^2 |> sum
ess(ŷ, y) = err(ŷ, zero(y) .+ mean(y)) |> (x)->x.^2 |> sum
tss2(ŷ, y) = mean(y) .- y |> (x)->x.^2 |> sum
ess2(ŷ, y) = mean(y) .- ŷ |> (x)->x.^2 |> sum
rss2(ŷ, y) = y .- ŷ |> (x)->x.^2 |> sum
rsq2(ŷ, y) = ess2(ŷ, y) / tss2(ŷ, y)
# linear model verify
# Mean Percentage Error (MPE) is a measure of bias
mpe(ŷ, y) = perr(ŷ, y) |> mean
mape(ŷ, y) = perr(ŷ, y) |> x -> abs.(x) |> mean
mdape(ŷ, y) = perr(ŷ, y) |> x -> abs.(x) |> median
rmspe(ŷ, y) = perr(ŷ, y) |> x -> x.^2 |> mean |> sqrt
rmdspe(ŷ, y) = perr(ŷ, y) |> x -> x.^2 |> median |> sqrt
"https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error"
smape(ŷ, y) = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> mean
smdape(ŷ, y) = 200. * abs.(err(ŷ, y)) ./ (ŷ + y) |> median
# TEST
y = [0, 1, 2, 3, 4]
ŷ = [1, 2, 2, 4, 5]
e = err(ŷ, y)
p = perr(ŷ, y)
mse(ŷ, y)
rmse(ŷ, y)
mae(ŷ, y)
mdae(ŷ, y)
rmdse(ŷ, y)
maxae(ŷ, y)
mape(ŷ, y)
mdape(ŷ, y)
rmspe(ŷ, y)
rmdspe(ŷ, y)
smape(ŷ, y)
smdape(ŷ, y)
# mean arctangent absolute percentage error (MAAPE)
maape(ŷ, y) = perr(ŷ, y) |> x->abs.(x) |> x->atan.(x) |> mean
MAAPE is from this 2016 paper. Great score for intermittent sales data...
Another to add to the list (looking ahead to image classfication) : Tversky Loss https://github.com/alan-turing-institute/MLJ.jl/issues/554
Many are also here: https://github.com/LAMPSPUC/ForecastAccuracy.jl
Hey, did anything ever come of this? Was this implemented in a different repo?
No, I think the most recent addition of metrics were the multiclass version of FScore and cousins. You can do measures()
to list what is available.
Came across these as well: https://github.com/beacon-biosignals/Lighthouse.jl
Yep. We got this somewhere.
@azev77 I'm currently performing a JuliaAI-wide issue review and collecting all issues in a GH project. All measure-related issues are being given a "measure" label (as this one has) and one can see all these at one glance. I'll make let you know when it's done. Most of these are marked "straightforward" but "low priority" - but there are a lot of them. I'm trying to see if we can get a student to help out - some of the work is not that hard but it's tedious and not very popular. If you know of someone who we could trust with this work and have a way to pay them...
https://mlr.mlr-org.com/articles/tutorial/measures.html
binary classification
accuracy
auc
bac, bacc, balanced_accuracy
BrierScore
f1, f1score
fdr, falsediscovery_rate
fn, falsenegative
fnr, falsenegative_rate, miss_rate
fp, falsepositive
fpr, falsepositive_rate, fallout
mcc, mathews_correlation
npv
ppv, precision
truenegative, tn
truenegative_rate, tnr, specificity, selectivity
truepositive, tp
truepositive_rate, tpr, recall, sensitivity, hit_rate
multiclass classification
regression
some of these may be available in LossFunctions (?) + I did this one on the top of my head so may be worth double checking
survival analysis
we don't have that yet
cluster analysis
we don't have any of those but probably should
Cost-senstive classification
we don't have that
General performance model
(?)