alexkowa / EnvStats

EnvStats — Package for Environmental Statistics, Including US EPA Guidance. Homepage: http://www.probstatinfo.com
https://alexkowa.github.io/EnvStats/
26 stars 7 forks source link

enparCensored SE for left-censored data seems incorrect #34

Open cfholbert opened 3 months ago

cfholbert commented 3 months ago

The standard error (SE) of the mean clculated using the enparCensored() function seems to be giving low values when the smallest left-censored value is less than or equal to the smallest uncensored value. I am using correct.se = TRUE for the bias correction. The Details state that the corection is n/(n - 1) but inspection of the code suggests that it is d/(d - 1), where d are the number of detections. Also, I compared the SE to that obtained from "Improved Methods for Calculating Concentrations Used in Exposure Assessments" (BJC/OR-416) using the example data they are significantly different. The SE from "Improved Methods for Calculating Concentrations Used in Exposure Assessments" (BJC/OR-416) seems to mtcah that from EPA's ProUCL software. The code I am using is shown below.

Calculate PLE of mean and variance from a censored dataset.

See 'Improved Methods for Calculating Concentrations Used in Exposure

Assessments' (BJC/OR-416) for more details.

ple <- function(x, d, ...) {

Get list of unique detections plus lowest reading regardless

d <- as.logical(d) xp <- unique(c(min(x), sort(x[d])))

Create matrix to store intermediate steps of PLE calculation

t <- matrix(1.0, nrow = length(xp), ncol = 7)

Fill first row with detected values

t[, 1] <- xp

First row is calculated slightly differently

t1 <- xp[1] t2 <- sum(x <= t1) t3 <- sum(d[x == t1]) t4 <- 0 t5 <- t1 t6 <- t5 * t4 t7 <- t5 - t6 t[1, ] <- c(t1, t2, t3, t4, t5, t6, t7)

Last row is also calculated slightly differently

i <- length(xp) t1 <- xp[i] t2 <- sum(x <= t1) t3 <- sum(d[x == t1]) t4 <- ((t2 - t3) / t2) t5 <- t1 - t[i - 1, 1] t6 <- t5 * t4 t7 <- t5 - t6 t[i, ] <- c(t1, t2, t3, t4, t5, t6, t7)

Loop through rows 2 through n-1 backwards

for(i in (length(xp) - 1):2) {

t1 <- xp[i]
t2 <- sum(x <= t1)
t3 <- sum(d[x == t1])
t4 <- t[i + 1, 4] * ((t2 - t3) / t2)   # (B-C)/B * previous ple
t5 <- t1 - t[i - 1, 1]
t6 <- t5 * t4
t7 <- t5 - t6

t[i, ] <- c(t1, t2, t3, t4, t5, t6, t7)

}

vd <- t[(3 - as.logical(sum(d[x==min(xp)]))):nrow(t), ]

vd <- t[2:nrow(t), ]

Get mean and variance

mu.hat <- sum(t[, 7]) se.hat <- sqrt( sum( sapply( 1:nrow(vd), function(i, vd) sum(vd[1:i, 6])^2 / (vd[i, 2] (vd[i, 2] - vd[i, 3])) vd[i, 3], vd=vd)) * (nrow(t) / (nrow(t) - 1 ) ) )

Return a list of mean and variance

list(mean = mu.hat, se = se.hat)

}

SteveMillard commented 2 months ago

@cfholbert thanks for pointing this out! I can work on a fix for this in a couple of weeks.

cfholbert commented 2 months ago

@SteveMillard I compared the SE from EnvStats, the ple code from BJC/OR-416, and the kmms code from the STAND package. The SE from kmms mtaches that from ProUCL and is slight ly different than that from BJC/OR-416 when the lowest value is censored. EnvStats provides the smallest SE of the three codes. I'm not sure why the ple code and the kmms code does not give the smae SE. I've attached all of the files that include the BJC/OR-416 example data, the various functions, and the small code snippet I used to compare the three.

code.zip