adriancorrendo / soiltestcorr

Soil Test Correlation & Calibration in R
https://adriancorrendo.github.io/soiltestcorr/
Other
8 stars 2 forks source link

Slight differences when comparing two Cate-Nelson function and their outputs #4

Closed austinwpearce closed 1 year ago

austinwpearce commented 1 year ago

The cate_nelson_1965() and cateNelsonFixedY() functions from the soiltestcorr and rcompanion packages, respectively, should ideally compute the same critical X value. However, in the example I included, they don't.

Can we explore why this is?

R Code included in folder cate_nelson_problem.zip

R code pasted directly

library(tidyverse)
library(soiltestcorr)
library(rcompanion)

# dataset
corr <- tibble(
  stv = c(16,17,22,26,27,28,28,30,35,38,40,43,43,50,50,
          51,51,52,54,56,60,65,66,71,74,76,76,80,81,83,
          83,90,92,94,98,108,146,174,177,179,196), 
  ry = c(31.7,21.5,42.4,60.8,56.4,58.4,66.1,74.3,80.5,69.8,82.7,
         98.8,61.6,92.4,96.8,91.5,99.8,97.3,98.0,100.0,97.4,85.6,
         88.3,99.4,98.7,96.5,87.6,97.2,99.7,96.1,95.5,92.4,84.5,
         85.8,95.9,100.0,96.5,100.0,99.0,92.4,97.7))

theme_set(theme_minimal())

baseplot <-
  ggplot(data = corr,
         aes(stv, ry)) + 
  geom_point() +
  coord_equal()

baseplot

# Cate-Nelson using soiltestcorr
# Critical X is 50.5 ppm
cstv_soiltestcorr <- 
  soiltestcorr::cate_nelson_1965(corr, stv, ry, target = 95)$CSTV %>% 
  round()

# Cate-Nelson using rcompanion
# Critical X is 43 ppm 
cstv_rcompanion <- 
  rcompanion::cateNelsonFixedY(corr$stv, corr$ry, cly = 95, outlength = 1)$Critx

# R^2 value when CSTV = 50.5 using soiltestcorr
# 0.502
soiltestcorr::cate_nelson_1965(corr, stv, ry, target = 95)$R2

# R^2 value using following method

x <- corr$stv # data already sorted by STV
y <- corr$ry

# Create a data.frame to store the results

out <- tibble(
  x = NA,
  mean1 = NA,
  css1 = NA,
  mean2 = NA,
  css2 = NA,
  r2 = NA
)

css <- function(x) {
  var(x) * (length(x) - 1)
}
tcss <- css(y) # Total corrected sum of squares

for (i in 2:(length(y) - 2)) {
  y1 <- y[1:i]
  y2 <- y[-(1:i)]
  out[i, 'x'] <- x[i]
  out[i, 'mean1'] <- mean(y1)
  out[i, 'mean2'] <- mean(y2)
  out[i, 'css1'] <- css1 <- css(y1)
  out[i, 'css2'] <- css2 <- css(y2)
  out[i, 'r2'] <- (tcss - (css1 + css2)) / tcss

}

# How is CSTV determined? R2? Lowest sum of squares?
# CSTV = 43 gives R2 of 0.55 and 0.62
# CSTV = 50 (50.5) gives R2 of 0.56 and 0.50
out %>% 
  filter(x == cstv_soiltestcorr | x == cstv_rcompanion)

# Highest R2 associated with CSTV = 28
out %>% slice_max(r2)
# A tibble: 4 × 6
      x mean1  css1 mean2  css2    r2
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    43  62.0 5342.  93.9 1708. 0.552
2    43  61.9 5342.  95.1  627. 0.620
3    50  64.1 6205.  95.2  619. 0.566
4    50  66.3 7203.  95.1  616. 0.503
austinwpearce commented 1 year ago

After discussion about Cate Nelson functions, it seems the rcompanion package uses a sort of hybrid approach that blends the graphical and statistical approach. Differences with soiltestcorr will likely be minor in many cases, and users should just be aware.

adriancorrendo commented 1 year ago

Austin,

The differences between soiltestcorr::cate_nelson_1965() and the rcompanion::cateNelsonFixedY() are:

Hope this explanation helps to explain the difference,

Best,

ADRIAN