david-cortes / approxcdf

(Python, R, C) Fast approximations for the CDF of multivariate normal distributions
http://approxcdf.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Inaccurate approximation and some ideas #4

Closed jobstdavid closed 2 months ago

jobstdavid commented 2 months ago

Hey,

I tested your R-package "approxcdf" and I like it a lot, especially due to its speed! That's great! Unfortunately, I detected a completely inaccurate approximation for the trivariate normal distribution:

library(approxcdf)
library(mvtnorm)
set.seed(435)
Cov <- randcorr::randcorr(3)
upper <- c(0, 0, Inf)
# result 0
approxcdf::pmvn(upper, Cov = Cov, mean = rep(0, 3))
# result 0.1551707
as.numeric(mvtnorm::pmvnorm(upper = upper, corr = Cov, algorithm = "TVPACK"))
as.numeric(mvtnorm::pmvnorm(upper = upper, corr = Cov, algorithm = "GenzBretz"))

# hacky solution (result 0.1551707)
approxcdf::pmvn(c(0, 0, 10000), Cov = Cov, mean = rep(0, 3))

This issue also happens for

upper <- c(0, Inf, Inf)

i.e. settings where more than one upper bound is infinity are affected, as well. Do you think, you can fix it?

Some other ideas came to my mind, which could improve the ease of use and you could think about it:

david-cortes commented 2 months ago

Thanks for pointing this out.

david-cortes commented 2 months ago

Also, if you're looking for a fast bivariate/trivariate CDF calculator to use in some package, you can shave off most of the running time latency from mvtnorm by calling the TVPACK Fortran functions directly without the extra checks that mvtnorm does: https://www.math.wsu.edu/faculty/genz/software/software.html

Timings from it are not too much worse than from the implementations in this package, but the calculations are more accurate.

jobstdavid commented 2 months ago

@david-cortes Thank you for your helpful hints!