PrestoGP is an R package for scalable penalized regression on spatio-temporal outcomes using Gaussian processes. The package includes the methods described in the paper "Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide" by Messier and Katzfuss (2020). The package is designed to handle big data and is useful for large-scale geospatial exposure assessment and geophysical modeling. PrestoGP expands the methods in Messier and Katzfuss (2020) to include the following important feature:
Multivariate spatiotemporal outcomes using multivariate Matern Gaussian process. The method is described in the paper "A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components" by Apanasovich, Genton, and Sun (2012).
Simultaneous variable selection and estimation of the fixed effects (i.e. land-use regression variables) and the spatiotemporal random effects. The method is described in the paper "Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide" by Messier and Katzfuss (2020).
Imputation of missing outcome data, including outcome data that is missing due to limit of detection effects.
And as always, scalability of the Gaussian Process via the General Vecchia approximation as described in the paper "A general framework for Vecchia approximations of Gaussian processes" by Katzfuss and Guinness (2021).
You can install the development version of PrestoGP from GitHub with:
# install.packages("devtools")
devtools::install_github("NIEHS/PrestoGP")
data(soil)
soil <- soil[!is.na(soil[,5]),] # remove rows with NA's
y <- soil[,4] # predict moisture content
X <- as.matrix(soil[,5:9])
locs <- as.matrix(soil[,1:2])
soil.vm <- new("VecchiaModel", n_neighbors = 10)
soil.vm <- prestogp_fit(soil.vm, y, X, locs)
miss <- sample(1:nrow(soil), 20)
y.miss <- y
y.miss[miss] <- NA
soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
soil.vm2 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE)
soil.lod <- quantile(y, 0.1)
y.lod <- y
y.lod[y.lod <= soil.lod] <- NA
soil.vm3 <- new("VecchiaModel", n_neighbors = 10)
soil.vm3 <- prestogp_fit(soil.vm, y, X, locs, impute.y = TRUE, lod = soil.lod)
soil.fm <- new("FullModel")
soil.fm <- prestogp_fit(soil.fm, y, X, locs)
ym <- list()
ym[[1]] <- soil[,5] # predict two nitrogen concentration levels
ym[[2]] <- soil[,7]
Xm <- list()
Xm[[1]] <- Xm[[2]] <- as.matrix(soil[,c(4,6,8,9)])
locsm <- list()
locsm[[1]] <- locsm[[2]] <- locs
soil.mvm <- new("MultivariateVecchiaModel", n_neighbors = 10)
soil.mvm <- prestogp_fit(soil.mvm, ym, Xm, locsm)
data(soil250, package="geoR")
y2 <- soil250[,7] # predict pH level
X2 <- as.matrix(soil250[,c(4:6,8:22)])
# columns 1+2 are location coordinates; column 3 is elevation
locs2 <- as.matrix(soil250[,1:3])
soil.vm2 <- new("VecchiaModel", n_neighbors = 10)
# fit separate scale parameters for location and elevation
soil.vm2 <- prestogp_fit(soil.vm2, y2, X2, locs2, scaling = c(1, 1, 2))
# Create training and test sets
n <- length(y)
otr <- rep(FALSE, n)
otr[sample(1:n, size=floor(n/2))] <- TRUE
otst <- !otr
# Fit the model on the training set
soil.vmp <- new("VecchiaModel", n_neighbors = 10)
soil.vmp <- prestogp_fit(soil.vm, y[otr], X[otr,], locs[otr,])
# Perform predictions on the test set
soil.yhat <- prestogp_predict(soil.vmp, X[otst,], locs[otst,])
This package is currently under development. Please report any issues on the Issues page
The manuscript is in progress and expected to be submitted soon. Please check back for updates or contact Kyle Messier for more information.