`fitMcmcNull()` does not handle vectors of `NA` or `NULL`

mvankessel-EMC commented 1 month ago

While running an SCCS analysis I run into the following error during the calibration of the estimates:

#> Error in optim(c(0, 100), loglikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr):
#>  function cannot be evaluated at initial parameters

What seems to be the case is that the supplied logRr and SeLogRr are both vectors containing NA's. So they get all get removed here. Which results in a vector of numeric(0) (length = 0) for both logRr and SeLogRr.

If after removal should there be a check that asserts the length of both and if they are of length 0, set the value to 0, or a vector of the original length of 0's? I'm not sure what 'standard' value you'd use in this case.

I.e.:

if (any(is.na(seLogRr))) {
  warning("Estimate(s) with NA standard error detected. Removing before fitting null distribution")
  logRr <- logRr[!is.na(seLogRr)]
  seLogRr <- seLogRr[!is.na(seLogRr)]
}
if (any(is.na(logRr))) {
  warning("Estimate(s) with NA logRr detected. Removing before fitting null distribution")
  seLogRr <- seLogRr[!is.na(logRr)]
  logRr <- logRr[!is.na(logRr)]
}
if (length(seLogRr) == 0) {
  # Or should this be a vector of 0's of the length of the original seLogRr vector?
  seLogRr <- 0
}
if (length(logRr) == 0) {
  logRr <- 0
}

mvankessel-EMC commented 1 month ago

Here are some toy examples with various logRr and seLogRr inputs:

logRr <- c(1, 2, 3)
seLogRr <- c(4, 5, 6)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      1.491929  1.491929    1.4919
#> Precision 1.475598  0.017284   24.2599
#> 
#> Acceptance rate: 0.859931400685993

logRr <- c(1, 2, NULL)
seLogRr <- c(4, 5, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      0.942659  0.942659    0.9427
#> Precision 2.661310  0.011013   35.4243
#> 
#> Acceptance rate: 0.91490085099149

logRr <- c(1, 2, NA)
seLogRr <- c(4, 5, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Estimated null distribution (using MCMC)
#> 
#>            Estimate lower .95 upper .95
#> Mean      0.9426589 0.9426589    0.9427
#> Precision 0.6259526 0.0067681   11.5137
#> 
#> Acceptance rate: 0.871571284287157

logRr <- c(NULL, NULL, NULL)
seLogRr <- c(NULL, NULL, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Error in eval(expr, envir, enclos): Not compatible with requested type: [type=NULL; target=double].

logRr <- c(NA, NA, NA)
seLogRr <- c(NA, NA, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Error in optim(c(0, 100), logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr): function cannot be evaluated at initial parameters

^{Created on 2024-08-21 with reprex v2.1.1}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.1 (2024-06-14 ucrt) #> os Windows 11 x64 (build 22631) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Netherlands.utf8 #> ctype Dutch_Netherlands.utf8 #> tz Europe/Amsterdam #> date 2024-08-21 #> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1) #> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1) #> EmpiricalCalibration 3.1.2 2024-08-21 [1] Github (OHDSI/EmpiricalCalibration@b3ade3b) #> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.1) #> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.1) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.1) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1) #> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.1) #> Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1) #> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.1) #> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.1) #> rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.1) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.1) #> withr 3.0.1 2024-07-31 [1] CRAN (R 4.4.1) #> xfun 0.47 2024-08-17 [1] CRAN (R 4.4.1) #> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1) #> #> [1] C:/R/R-4.4.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

schuemie commented 1 month ago

Hmmm, I guess if there are no valid estimates at all, the fitted distribution should have NA parameters, and calibration should return NA. I'll add this as a special case and make sure it gets handled correctly.

Where do you see NULL estimates?

schuemie commented 1 month ago

This should now be handled in this commit: https://github.com/OHDSI/EmpiricalCalibration/commit/37e237cd1385321647ba1e5e588e9f6c581b4235

The real question is: why did you have NA estimates for all your negative controls?

mvankessel-EMC commented 1 month ago

So sometimes the logRr and seLogRr would be vectors of numeric values, but sometimes would be vectors of NA. Could this be due to no records existing for all the negative controls? Would you typically handle this by explicitly removing negative controls that have 0 counts?

mvankessel-EMC commented 1 month ago

So I tried the above. There are values in both logRr and seLogRr now. It crashes when EmpiricalCalibration:::logLikelihoodNullMcmc() returns Inf. See the repex:

logRr <- c(
  -0.040514268, 0.481692514, 0.036482466, -0.095145701, 0.314433784,
  -1.497334478, 0.183195428, 0.289136733, 0.402694353, 0.367573632,
  0.463256874, -0.054422481, -0.060860638, 0.013167817, -18.000212194,
  1.052980223, -0.008283057, -0.031046398, -1.321463935, 0.559708684,
  0.768082068, 0.116301485, 0.164942591, 0.746996019
)

seLogRr <- c(
  0.1483240473, 0.1497285553, 0.1020657210, 0.1609180653, 0.0937424754,
  0.7320934116, 0.1442173789, 0.0959738294, 0.1462481972, 0.3990881643,
  0.3010253775, 0.4164489931, 0.2158912359, 0.0003913046, 0.0000000000,
  0.6348588999, 0.5327961873, 0.3360717777, 1.1168928052, 0.2971940110,
  0.7763132355, 0.6266069969, 0.2506407727, 1.3517969687
)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

If I remove the value 0 from the seLogRr and the corresponding value from logRr it seems to not return Inf.

# Removing logRr 0 (index 15)
logRrNoZero <- logRr[c(1:14, 16:length(logRr))]
seLogRrNoZero <- seLogRr[c(1:14, 16:length(seLogRr))]

EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> [1] 28.44091
optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> $par
#> [1]  0.1692758 34.8485282
#> 
#> $value
#> [1] 21.13314
#> 
#> $counts
#> function gradient 
#>       79       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

Finally I checked the delta between each logRr and seLogRr value and the 15th value has a delta of ~18.

abs(logRr - seLogRr)
#> [1]  0.188838315  0.331963959  0.065583255  0.256063766  0.220691309  2.229427890  0.038978049
#> [8]  0.193162904  0.256446156  0.031514532  0.162231496  0.470871474  0.276751874  0.012776512
#> [15] 18.000212194  0.418121323  0.541079244  0.367118176  2.438356740  0.262514673  0.008231167
#> [22]  0.510305512  0.085698182  0.604800950

^{Created on 2024-08-26 with reprex v2.1.1}

I can reproduce it in a toy example aswel:

logRr <- c(rnorm(9), -19)
seLogRr <- c(rnorm(9), 0)
theta <- c(0, 100)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

^{Created on 2024-08-26 with reprex v2.1.1}

I'm out of my depth of what the implications are or how to solve it. But hopefully this is helpful.

mvankessel-EMC commented 1 month ago

This should now be handled in this commit: 37e237c

The real question is: why did you have NA estimates for all your negative controls?

Regarding this build, I run into a different error:

#> Error in quantile.default(dist, c(0.5, alpha/2, 1 - (alpha/2))) : 
#>   missing values and NaN's not allowed if 'na.rm' is FALSE
#> In addition: Warning messages:
#> 1: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   Estimate(s) with NA standard error detected. Removing before fitting null distribution
#> 2: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   No valid estimates left. Returning undefined null distribution

I'm assuming it is this quantile() call, that also should have na.rm = TRUE.

OHDSI / EmpiricalCalibration

`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9