bug in r2_score implementation

nrakocz commented 3 years ago

Describe the bug there is a bug in implementing the Bayesian R-square as described by Aki Vehtari.

In his description:

$R^2 = \frac{\mathrm{Var}_{\mu}}{\mathrm{Var}_{\mu}+\mathrm{Var}_\mathrm{res}}$

For each posterior sample here's a link: https://avehtari.github.io/ROS-Examples/Rsquared/rsquared.html

The Arviz implementation: var_y_est = _numba_var(svar, np.var, y_pred.mean(0)) var_e = _numba_var(svar, np.var, (y_true - y_pred), axis=0) r_squared = var_y_est / (var_y_est + var_e)

dimensions using example in source code data = az.load_arviz_data('regression1d') y_true = data.observed_data["y"].values y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T az.r2_score(y_true, y_pred)

dimensions: y_true.shape (100,) y_pred.shape (2000,100)

The problem Notice that the original description generates n R-square samples, one for each posterior sample, while the Arviz implementation generates N samples, one for each data point.

A correct implementation would be: var_y_est = _numba_var(svar, np.var, y_pred,axis=1) var_e = _numba_var(svar, np.var, (y_true - y_pred), axis=1) r_squared = var_y_est / (var_y_est + var_e)

I hope this helps and that I didn't make any mistakes.

aloctavodia commented 3 years ago

Hi @nrakocz thanks for taking the time to report this.

It seems to me that the current implementation is correct. var_y_est is the variance over the expected value of y. that is you first compute the mean for each observation and then the variance over the N observations.

Also with the current implementations something like:

y_true = np.random.normal(0, 1, 100)
y_pred = np.random.normal(y_true, 100, size=(500, 100))

will give you a value of r2 around 0.5 while the current implementation will give a value closer to zero, as expected.

nrakocz commented 3 years ago

Hi @aloctavodia! Thank you for replying so quickly!

I understand where the confusion comes from. The notation Var_mu from the link is a little confusing.

The book, however, has it written more clearly: Bayes_r2

Notice that there is an R2 score for each posterior simulated sample s.

Also, your example violates the assumption of y_pred_s = X\beta_s (predicted y's lie on a shared plane from samples \theta_s), which might be why the results you showed dont make much sense.

Intuitively, R2 measures the variance explained, hence for each posterior sample s we would like to measure how much of the variance is explained by the model, i.e., the variance of our predicted values (X\beta_s), versus the variance of the residuals (given parameters \beta_s) added to the model variance. That way, we get a posterior distribution of the R2 score.

Here is also a link to the paper that has a good (not as good as the book's) description: https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1549100

Kindly let me know if we still disagree :)

nrakocz commented 3 years ago

Thank you, @ahartikainen, for your comment. That supports my suggestion.

I would also suggest Arviz.r2_score return all samples of R2 instead of just the mean and std (as in current implementation) to provide the entire distribution and let the user decide what to do with it. But that might be another issue.

I don't mind submitting pull requests if it helps.

ahartikainen commented 3 years ago

Sorry, I deleted my comment

There was a mention that y (100,) + yhat (2000,100) -> r2 (2000,)

I would like to mention that our help docs is not really helpful.

    """R² for Bayesian regression models. Only valid for linear models.
    Parameters
    ----------
    y_true: array-like of shape = (n_samples) or (n_samples, n_outputs)
        Ground truth (correct) target values.
    y_pred: array-like of shape = (n_samples) or (n_samples, n_outputs)
        Estimated target values.
    Returns
    -------
    Pandas Series with the following indices:
    r2: Bayesian R²
    r2_std: standard deviation of the Bayesian R².
    Examples
    --------
    Calculate R² for Bayesian regression models :
    .. ipython::
        In [1]: import arviz as az
           ...: data = az.load_arviz_data('regression1d')
           ...: y_true = data.observed_data["y"].values
           ...: y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T
           ...: az.r2_score(y_true, y_pred)
    """

We probably would like to have 1 function that returns the array of r2 values and then r2_score will call it and return mean and std. Now we are unable to draw histogram from the r2 values.

ahartikainen commented 3 years ago

https://github.com/jgabry/bayes_R2/blob/master/bayes_R2.R

bayes_R2 <- function(fit) {
  y <- rstanarm::get_y(fit)
  ypred <- rstanarm::posterior_linpred(fit, transform = TRUE)
  if (family(fit)$family == "binomial" && NCOL(y) == 2) {
    trials <- rowSums(y)
    y <- y[, 1]
    ypred <- ypred %*% diag(trials)
  }
  e <- -1 * sweep(ypred, 2, y)
  var_ypred <- apply(ypred, 1, var)
  var_e <- apply(e, 1, var)
  var_ypred / (var_ypred + var_e)
}

aloctavodia commented 3 years ago

@nrakocz You are right, sorry that i bring noise to the discussion. Thanks for noticing this and offering to do a PR. I agree it will be nice to have an option to return all the values. I think it can be the same function with an argument to return the entire posterior distributions or a summary.

rlouf commented 3 years ago

I was about to open an issue to suggest the exact same thing as @nrakocz : r2_score should provide the entire distribution of R2 values.

arviz-devs / arviz

bug in r2_score implementation #1662