easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1k stars 87 forks source link

Add a function to compute coefficient of variation #433

Open IndrajeetPatil opened 2 years ago

IndrajeetPatil commented 2 years ago

With this as a starting point: https://github.com/strengejacke/sjstats/blob/master/R/cv.R

IndrajeetPatil commented 2 years ago

@rempsyc Would you be interested in working on this?

rempsyc commented 2 years ago

No.

bwiernik commented 2 years ago

I'll do this one this weekend

bwiernik commented 2 years ago

In the context of a fitted model, how is the CV typically used? I see the linked function computes the observed mean of the response and then divides that by the RMSE/sigma of the model. How is that information used/what is it interpreted to convey? @strengejacke @IndrajeetPatil @DominiqueMakowski @mattansb

(Currently, I'm putting a generic and numeric method into {datawizard} and will add a method for models here.)

bwiernik commented 2 years ago

Also, we already have performance_cv() referring to estimating cross-validation performance, so which of these should we do:

  1. Just make CV available as a metric in model_performance() and not a separate function (similar to sigma)
  2. Make the CV function performance_coefvar() or similar.
  3. Rename existing performance_cv() to performance_crossvalidate() or similar.

My preference is strongly not (3).

DominiqueMakowski commented 2 years ago

How is that information used/what is it interpreted to convey?

Never used that, but from here:

The coefficient of variation (CV) is a statistical measure of the relative dispersion of data points in a data series around the mean. In finance, the coefficient of variation allows investors to determine how much volatility, or risk, is assumed in comparison to the amount of return expected from investments. The lower the ratio of the standard deviation to mean return, the better risk-return trade-off.

Not sure if it is used much in other fields?

we already have performance_cv()

that's a pickle. I think I lean towards 1) Rename existing performance_cv() to performance_crossvalidate(), then perhaps add a function like performance_dispersion() or something like that that would encompass sigma and CV?

DominiqueMakowski commented 2 years ago

My preference is strongly not (3).

why? having an explicit alias for performance_cv doesn't sound like a bad idea

DominiqueMakowski commented 2 years ago

though it's true that we shouldn't add so many aliases either it gets confusing and it bloats the namespace

strengejacke commented 2 years ago

In the context of a fitted model, how is the CV typically used? I see the linked function computes the observed mean of the response and then divides that by the RMSE/sigma of the model. How is that information used/what is it interpreted to convey? @strengejacke @IndrajeetPatil @DominiqueMakowski @mattansb

(Currently, I'm putting a generic and numeric method into {datawizard} and will add a method for models here.)

See https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-is-the-coefficient-of-variation, section "in the model setting".

mattansb commented 2 years ago

This seems very odd - afaik CV is only interpretable when the variable is on a ratio scale, which residuals are not (from which rmse is computed).

In the context of regression, I've only seen CV used to test the assumption of dispersion in Poisson reg (where CV should be constant across predicted values).

mattansb commented 2 years ago

I mean y ~ x and I(y - 30) ~ x will have a very different CV.

I guess that in finance, where $$ is on a ratio scale that would make sense? (But even then, if you want a representation of relative prediction error, perhaps sqrt(mean(((y - pred) / pred)^2)) would be more suitable?)

bwiernik commented 2 years ago

why?

Changing the function name (not adding an alias, but removing an existing name and giving it to a different function) is a breaking change and I would much rather avoid those when possible