bnaras / bcaboot

17 stars 6 forks source link

BUG: Apparently, bcajack loses the final element of a vector when that final element has been included in a subset. #7

Open R180 opened 2 years ago

R180 commented 2 years ago

@bnaras

# BUG: Apparently, bcajack loses the final element of a vector when that final element has been included in a subset.
#
# ----------------------------------------------------------------------------
# This does NOT work (produces NAs for the BCA confidence interval limits).
# ----------------------------------------------------------------------------
myData <- c(4,8,2,5,4,6,7,7,1,8,2,2,5,3,6,7,3,6,2,2,1,9,3,2,335)
myFunction <- function(k) {
    myStat <- mean(k[1:25]) # This causes bcajack to produce NAs for the BCA confidence limits
                            # because the subset includes the final value in the input-vector.
    return(myStat)
}
set.seed(5)
myBcaResult <- bcajack(x = myData, B = 5000, func = myFunction,
verbose = FALSE, alpha = c(0.001,0.01,0.05))
myBcaResult
#
# ----------------------------------------------------------------------------
# This DOES work since the function doesn't use the final value of the input vector.
# ----------------------------------------------------------------------------
myData <- c(4,8,2,5,4,6,7,7,1,8,2,2,5,3,6,7,3,6,2,2,1,9,3,2,335)
myFunction <- function(k) {
    myStat <- mean(k[1:24]) # This produces proper output 
                            # since the subsetting excludes the input-vector's final value.
return(myStat)
}
set.seed(5)
myBcaResult <- bcajack(x = myData, B = 5000, func = myFunction,
verbose = FALSE, alpha = c(0.001,0.01,0.05))
myBcaResult

# ----------------------------------------------------------------------------
# This DOES work, since there's no subsetting.
# ----------------------------------------------------------------------------
myData <- c(4,8,2,5,4,6,7,7,1,8,2,2,5,3,6,7,3,6,2,2,1,9,3,2,335)
myFunction <- function(k) {
    myStat <- mean(k) # This produces proper output 
                                   # since there's no subsetting.
return(myStat)
}
set.seed(5)
myBcaResult <- bcajack(x = myData, B = 5000, func = myFunction,
verbose = FALSE, alpha = c(0.001,0.01,0.05))
myBcaResult
bnaras commented 2 years ago

I don't see how this is a bug. myFunction is under your control and you've hardwired a constant into the index vector (k[1:25]). Nowhere is a guarantee made about length(x); indeed, for the jack-knife calculations, the length is n-1, but best not to assume that either. Also, the documentation for bcajack states that func(x) should return a real value.

R180 commented 2 years ago

Thanks. I had not realized that the element numbers (e.g., row numbers) corresponded to the element numbers in the original data set rather than the bootstrapped (or jackknifed) data set. S I have included the following note my self in the following R code (which does indeed work):

myData <- cbind(c(4,8,2,5,4,6,7,7,1,8,5,2,3,5,8,2),c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2))
myFunction <- function(k) {

  # Note. The function should not refer to element numbers (row numbers, column numbers, 
  # or vector element numbers) within the function's input data, since bcajack controls
  # which element numbers are utilized at any given point in time. 

  myStat <- mean(k[k[,2] == 1, 1]) - mean(k[k[,2] == 2 ,1])   
return(myStat)
}
set.seed(5)
myBcaResult <- bcajack(x = myData, B = 5000, func = myFunction,
verbose = FALSE, alpha = c(0.001,0.01,0.05))
myBcaResult