AJResearchGroup / nsphs_ml_qt

R package for nsphs_ml_qt
GNU General Public License v3.0
0 stars 1 forks source link

Plot in R2 #53

Closed richelbilderbeek closed 2 years ago

richelbilderbeek commented 2 years ago

From Mathias:

For quantitative predictions you can use delta R-squared This is also known as the squared semi-partial correlation coefficient.

richelbilderbeek commented 2 years ago

I quote from https://en.wikipedia.org/wiki/Coefficient_of_determination:

When evaluating the goodness-of-fit of simulated (Ypred) vs. measured (Yobs) values, it is not appropriate to base this on the R2 of the linear regression (i.e., Yobs= m·Ypred + b)

richelbilderbeek commented 2 years ago

See Piñeiro, Gervasio, et al. "How to evaluate models: observed vs. predicted or predicted vs. observed?." Ecological modelling 216.3-4 (2008): 316-322. https://www.sciencedirect.com/science/article/abs/pii/S0304380008002305

Use predicted at the x-axis, like this:

po

The r2 will be the same, but -what is new- the linear fit will be correct.

richelbilderbeek commented 2 years ago

Notes at https://notesarc.com/how-to-evaluate-models-observed-vs-predicted-or-predicted-vs-observed/

richelbilderbeek commented 2 years ago

Data from 'Estimates of New Zealand forest and scrub biomass from the 3-PG model', https://www.sciencedirect.com/science/article/abs/pii/S0304380000002519 where website for it was at http://www.ffp.csiro.au/nfm/mdp/

richelbilderbeek commented 2 years ago

Ignore Pineiro, I will use true on the X axis and predicted on the y axis.

Below a simulation to prove it: the intercept should be zero. For true_pred this is the case:

Screenshot from 2022-06-13 13-02-13


n <- 1000

intercepts_wide <- tibble::tibble(
  est_true = rep(NA, n),
  true_est = rep(NA, n)
)

for (i in seq_len(n)) {
  set.seed(i)
  t <- tibble::tibble(
    true = seq(1, 100),
    estimated = NA
  )
  t$estimated <- t$true + rnorm(n = 100, sd = 1)

  # Way 'est_true': estimated on x, true on y
  r_1 <- lm(formula = true ~ estimated,  t)
  r_1
  intercept_1 <- r_1$coefficients[1]
  intercepts_wide$est_true[i] <- as.numeric(intercept_1)
  ggplot2::ggplot(
    t, ggplot2::aes(x = estimated, y = true)
  ) + ggplot2::geom_point() + ggplot2::geom_smooth(method = "lm") +
    ggpubr::stat_regline_equation(label.y = 400, ggplot2::aes(label = ..eq.label..)) +
    ggpubr::stat_regline_equation(label.y = 350, ggplot2::aes(label = ..rr.label..))

  # Way 'true_est': true on x, estimated on y,
  r_2 <- lm(formula =  estimated ~ true,  t)
  r_2
  intercept_2 <- r_2$coefficients[1]
  intercepts_wide$true_est[i] <- as.numeric(intercept_2)
  ggplot2::ggplot(
    t, ggplot2::aes(x = true, y = estimated)
  ) + ggplot2::geom_point() + ggplot2::geom_smooth(method = "lm") +
    ggpubr::stat_regline_equation(label.y = 400, ggplot2::aes(label = ..eq.label..)) +
    ggpubr::stat_regline_equation(label.y = 350, ggplot2::aes(label = ..rr.label..))

}

intercepts <- tidyr::pivot_longer(data = intercepts_wide, cols = tidyr::everything())

ggplot2::ggplot(intercepts, ggplot2::aes(x = name, y = value)) +
  ggplot2::geom_boxplot()
richelbilderbeek commented 2 years ago

Done!

Screenshot from 2022-06-13 14-49-21