Effects plots for occurrence covariates

DEHewitt commented 9 months ago

Hi @doserjef,

I wasn't sure if you'd prefer this here or via email, as it isn't an issue with the package rather just some clarification. I opted for here in case it may be useful to others and it is easier to format code here than in an email.

I am trying to develop some code to predict latent occurrence probabilities across the range of a covariate. My model is an spPGOcc fit with the following call:

spPGOcc(occ.formula = ~scale(orientation), det.formula = ~scale(day), data = data, inits = inits, priors = priors, tuning = tuning, cov.model = cov.model, 
    NNGP = TRUE, n.neighbors = 5, n.batch = n.batch, batch.length = batch.length, n.omp.threads = n.omp.threads, verbose = verbose, n.report = n.report, 
    n.burn = n.burn, n.thin = n.thin, n.chains = n.chains)

I am interested in the effects of orientation on occurrence. I think you'll recognize the code I've adapted from your Introduction vignette, where you show how to predict these values at new locations and produce some occurrence maps. For the purposes of my research we're more interested in visualizing the covariate effect and I've attempted to do so by:

# 1. Housekeeping -----------------------------------------------------------------------
## Load the required libraries
library(spOccupancy)       # for fitting spatial occupancy models

## Load the data (as packaged for fitting in {spOccupancy})
data <- readRDS(file = "output/beachwatch-spOccupancy-formatted-data.rds")

## Load the fitted model
m <- readRDS(file = "output/beachwatch-spPGOcc.rds")

# 2. Predict from the model--------------------------------------------------------------
## Get the orientation values
orientation <- data$occ.covs$orientation

## Scale the orientation data
scaled_orientation <- (orientation - mean(data$occ.covs[, 1])) / sd(data$occ.covs[, 1])

## Create a design matrix of the covariate at the prediction locations
X.0 <- cbind(1, scaled_orientation)

## Create some dummy coords.0 so predict() works
coords.0 <- data$coords + 0.01

## Perform the prediction
prediction <- predict(object = m, X.0 = X.0, coords.0 = coords.0)

# Bundle into a dataframe for easy ggplotting
plot_df <- data.frame(x = coords.0[, 1],
                      y = coords.0[, 2],
                      orientation = orientation,
                      mean_psi = apply(prediction$psi.0.samples, 2, mean),
                      sd_psi = apply(prediction$psi.0.samples, 2, sd),
                      stringsAsFactors = FALSE)

# Plot the effect
ggplot(data = plot_df, aes(x = scaled_orientation)) + 
  geom_ribbon(aes(ymin = mean_psi - sd_psi, ymax = mean_psi + sd_psi), fill = "light grey") +
  geom_line(aes(y = mean_psi)) +
  coord_cartesian(ylim = c(0, 1))

This code works, in that it produces a plot (below), but I am wondering if this is the correct way to produce a plot like this? I am a little doubtful since the SD exceeds 1...

test-effects-plot

Thanks!

doserjef commented 8 months ago

Hi @DEHewitt,

Thanks for the note. That is not quite the correct way to generate a marginal effects plot. Instead of adding the SD to the values, you should instead calculate the 95% CI and use that. Here is an example of how to do that, which you should be able to adapt to your data set. Also note that instead of supplying dummy coordinates, it's better to convert the class to PGOcc, which will eliminate the need for supplying the coordinates matrix (and just set the spatial random effects to 0). I'm hoping to add in a function at some point that makes it less clunky to do that. Let me know if you have any other questions

rm(list = ls())
library(spOccupancy)
library(ggplot2)

# 1. Data prep ------------------------------------------------------------
# Read in the data source (reads in an object called data.goldfinch)
load(url("https://github.com/doserjef/spOccupancy_examples/raw/main/data/europeanGoldfinchSwiss.rda"))
str(data.goldfinch)

# 2. Model fitting --------------------------------------------------------
# Fit a spatial, single-species occupancy model using an NNGP
out.sp <- spPGOcc(occ.formula = ~ scale(elevation) + I(scale(elevation)^2) + scale(forest), 
                  det.formula = ~ scale(date) + I(scale(date^2)) + scale(dur), 
                  data = data.goldfinch, 
                  n.batch = 400, 
                  batch.length = 25,
                  NNGP = TRUE, 
                  n.neighbors = 5, 
                  n.thin = 10, 
                  n.burn = 5000, 
                  n.chains = 3,
                  n.report = 100)
summary(out.sp)

# 3. Prediction -----------------------------------------------------------
# Predict relationship with forest cover
# Create a set of values across the range of observed forest values
forest.pred.vals <- seq(min(data.goldfinch$occ.covs$forest),
                              max(data.goldfinch$occ.covs$forest),
                              length.out = 100)

# Standardize forest prediction values by those used to fit the model
forest.0 <- (forest.pred.vals - mean(data.goldfinch$occ.covs$forest)) / 
                sd(data.goldfinch$occ.covs$forest)
# Create prediction design matrix
# Notice I set the other covariates to 0 (which is their average value since I standardized them)
X.0 <- as.matrix(data.frame(intercept = 1, elevation = 0, 
                elevation.2 = 0, forest = forest.0))
# Convert out.sp object to have a class of PGOcc. This allows you to do prediction
# with a spatial occupancy model while setting the spatial random effect values to 0.
class(out.sp)
class(out.sp) <- "PGOcc"
# Predict at new locations
out.pred <- predict(out.sp, X.0)
# Convert the class of out.sp back to spPGOcc to avoid any future problems
class(out.sp) <- 'spPGOcc'
str(out.pred)
psi.0.quants <- apply(out.pred$psi.0.samples, 2, quantile,
                          prob = c(0.025, 0.5, 0.975))
psi.plot.dat <- data.frame(psi.med = psi.0.quants[2, ],
                                 psi.low = psi.0.quants[1, ],
                                 psi.high = psi.0.quants[3, ],
                           forest = forest.pred.vals)
ggplot(psi.plot.dat, aes(x = forest, y = psi.med)) +
  geom_ribbon(aes(ymin = psi.low, ymax = psi.high), fill = 'grey70') +
  geom_line() +
  theme_bw() +
  scale_y_continuous(limits = c(0, 1)) +
  labs(x = 'Forest (% cover)', y = 'Occupancy Probability')

DEHewitt commented 8 months ago

Thanks @doserjef. A function provided by the package would be great! I expect that some relationships we will want to fit may nonlinear (so will likely use {splines}) and there may be interactions... Is it naive to hope that this doesn't affect the example above too much? For the interaction terms, instead of setting the other variable to its mean would I just generate all unique combinations of the two (e.g., via expand.grid(x1, x2)?

biodiverse / spOccupancy

Effects plots for occurrence covariates #36