TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
64 stars 14 forks source link

Question about bioticStruct() and generating Single Species SDMs from sjSDM #127

Open dansmi-hub opened 1 year ago

dansmi-hub commented 1 year ago

Hi,

I'm a little stumped here and wondered if anyone could be of help.

I have generated several JSDM using sjSDM for my dataset where species cooccurrences and all that jazz help inform predictions of other species, but would like to compare that to models where predictions for each species receive no extra information and in effect turn into Stacked SDMs. My thought is this comparison will tell me a lot about whether JSDM methods are useful for my target species/data vs traditional SDM methods.

My initial thought is to tune an sjSDM for model on each individual species and then fit as below and combine the predictions together post-hoc to compare against an sjSDM model with all species included.

  model_single_sdm = list()
  for (i in 1:ncol(data$Y)) {
    model_sdm[[i]] <- sjSDM(
      Y = as.matrix(data$Y[, i]),
      env = linear(scale(data$X),
                   formula = as.formula(paste(
                     '~', paste(colnames(data[[1]]), collapse = "+")
                   ))),
      spatial = linear(SPV, ~ 0 + .),
      biotic = bioticStruct(df = dim(data$Y)[2]),
      family = binomial("logit"),
      device = "gpu",
      iter = 100,
      learning_rate = 0.005
    )
    names(model_sdm[i]) = colnames(data$Y)[1]
  }
model_sdm_preds = lapply(predict, model_sdm, newdata = newdata)

However I noticed in your new (perhaps under development) vignette regarding MSDMs and DNNs that you can turn the Joint component part of the model off in the biotic structure as so;

bioticStruct(diag = TRUE) # For MSDM
bioticStruct(diag = FALSE) # For JSDM

I was wondering whether this setting is enough to remove the shared component completely for an sjSDM model and would be suitable for my workflow and would be analogous to a single SDM for each species. Or if sjSDM models have something more going on underneath the hood and require more modification than this to generate stacked models, and I would need to use my approach above to truly isolate each species impact on the distributions of others?

Any info would be great!

Dan

MaximilianPi commented 1 year ago

Hi @dansmi-hub,

  1. bioticStruct(diag = TRUE) Yes, it completely turns off the JSDM characteristics. And as long as you don't use a DNN instead of a linear model it should be equal to a stacked SDM based on individual SDMs.

But I can you give already an answer to your question. The JSDM component, the variance-covariance matrix doesn't affect the coefficients of your species, the linear predictor is not affected by it, so JSDMs cannot improve/affect the predictions (see 10.1016/j.tree.2021.01.002). JSDM can only improve predictions if you have focal species for which you can condition your predictions on.

dansmi-hub commented 1 year ago

Thanks, very informative as always.

It sounds like JSDMs still do what I'm looking for. I'm interested in how including native mosquito species changes predictions for invasive mosquito species in my dataset.

My intuition from reading the Poggiato 2021 and Wilkinson 2020 is that this can still be done in the JSDM framework by holding out the native or invasive species portion of the model during the prediction stage and comparing these against each other and a full model?

It appears you can do this in HMSC [52] with some inbuilt code in the predict function.

Is something similar currently possible in sjSDM?

Apologies If I haven't been completely clear, I'm masquerading as a statistician here.

MaximilianPi commented 1 year ago

Yes, you want to condition your predictions on the native or on the invasive species, so to make only predictions for one species while the other one is used by the model.

Yes, Hmsc supports conditional predictions but sjSDM not yet.

MaximilianPi commented 1 year ago

...as a workaround, you could use one of the two species as a predictor/feature in your model.