biodiverse / spOccupancy

Single-species, Multi-species, and Integrated Spatial Occupancy Models
https://www.jeffdoser.com/files/spoccupancy-web/
GNU General Public License v3.0
52 stars 8 forks source link

stPGOcc: c++ Error #12

Closed abfleishman closed 2 years ago

abfleishman commented 2 years ago

I am attempting to run a Multi-season Single-Species Spatial model. I was able to run through the entire tutorial without issues. When I try with my own data, I immediately get an error (see below). I only have 3 years but I was able to run the non-spatial model without issues with my data.

I am using data from BirdNET (an automated CNN-based species detection model) run on acoustic data. Thus I have 46 sites, 3 years, and ~92 replicates each year (one for every day between March 1 and May 31) although I rarely have data from every day within a single year (because of battery issues).

Any help you can provide troubleshooting would be appreciated. I am so excited to be able to run these types of models.

I am using the attached data: spT.zip

library(spOccupancy)

# saveRDS(spT,"spT.rds")
spT<-readRDS("spT.rds")
occ.formula <- ~ year # coded numeric 1=2019, 2=2020, 3=2021
det.formula <- ~ scale(doy)  # day of year doy=yday(date)

z.inits <- apply(spT$y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0))
# Pair-wise distance between all sites
dist.spT <- dist(spT$coords) # utm anonymized
spT.sp.inits <- list(beta = 0, alpha = 0, z = z.inits,
                      sigma.sq = 1, phi = 3 / mean(dist.spT), 
                      sigma.sq.t = 1.5, rho = 0.2)

spT.sp.priors <- list(beta.normal = list(mean = 0, var = 2.72), 
                       alpha.normal = list(mean = 0, var = 2.72), 
                       sigma.sq.t.ig = c(2, 0.5), 
                       rho.unif = c(-1, 1),
                       sigma.sq.ig = c(2, 1), 
                       phi.unif = c(3 / max(dist.spT), 3 / quantile(dist.spT,probs = 0.1)))

cov.model <- 'exponential'
n.neighbors <- 5
ar1 <- TRUE

n.batch <- 600
batch.length <- 250
# Total number of samples
n.batch * batch.length

n.burn <- 10000
n.thin <- 20 

out.sp <- stPGOcc(occ.formula = occ.formula, 
                  det.formula = det.formula, 
                  data = spT, 
                  inits = spT.sp.inits, 
                  priors = spT.sp.priors, 
                  cov.model = cov.model, 
                  n.neighbors = n.neighbors,
                  n.batch = n.batch, 
                  batch.length = batch.length, 
                  verbose = TRUE, 
                  ar1 = ar1,
                  n.report = 200,
                  n.burn = n.burn, 
                  n.thin = n.thin, 
                  n.chains = 3) 
> out.sp <- stPGOcc(occ.formula = occ.formula, 
+                   det.formula = det.formula, 
+                   data = spT, 
+                   inits = spT.sp.inits, 
+                   priors = spT.sp.priors, 
+                   cov.model = cov.model, 
+                   n.neighbors = n.neighbors,
+                   n.batch = n.batch, 
+                   batch.length = batch.length, 
+                   verbose = TRUE, 
+                   ar1 = F,
+                   n.report = 200,
+                   n.burn = n.burn, 
+                   n.thin = n.thin, 
+                   n.chains = 3) 
----------------------------------------
    Preparing the data
----------------------------------------
----------------------------------------
    Building the neighbor list
----------------------------------------
----------------------------------------
Building the neighbors of neighbors list
----------------------------------------
----------------------------------------
    Model description
----------------------------------------
Spatial NNGP Multi-season Occupancy Model with Polya-Gamma latent
variable fit with 46 sites and 3 primary time periods.

Samples per chain: 150000 (600 batches of length 250)
Burn-in: 10000 
Thinning Rate: 20 
Number of Chains: 3 
Total Posterior Samples: 21000 

Using the exponential spatial correlation model.

Using 5 nearest neighbors.

Source compiled with OpenMP support and model fit using 1 thread(s).

Adaptive Metropolis with target acceptance rate: 43.0
----------------------------------------
    Chain 1
----------------------------------------
Sampling ... 

Error in stPGOcc(occ.formula = occ.formula, det.formula = det.formula, : c++ error: dpotrf failed

abfleishman commented 2 years ago

I just saw the closed issue #9 which looks similar. That issue references large un-scaled covars. I do not have any of those as far as I can tell. I have tried transforming the year to be 1,2,3 instead of 2019,2020,2021, and I have tried subtracting the x =x-min(x) and y=y-min(y) for the coordinates. and the day of year is being scaled in the formula.

abfleishman commented 2 years ago
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] spOccupancy_0.4.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        RANN_2.6.1        codetools_0.2-18  lattice_0.20-45   foreach_1.5.2    
 [6] MASS_7.3-57       grid_4.2.1        nlme_3.1-157      coda_0.19-4       minqa_1.2.4      
[11] doParallel_1.0.17 nloptr_2.0.3      Matrix_1.4-1      boot_1.3-28       splines_4.2.1    
[16] lme4_1.1-30       iterators_1.0.14  tools_4.2.1       parallel_4.2.1    abind_1.4-5      
[21] compiler_4.2.1 
doserjef commented 2 years ago

Hi Abram,

Thanks for posting this and including all the very useful information/data! That error message comes from a problem when calculating the spatial covariance matrix in the underlying MCMC sampler (although it's not very informative and I've added that to my todo list to make it more obvious that is the issue). The problem is arising from two of your sites that have the same spatial coordinates. Specifically in your spT$coords matrix of site coordinates, site 12 and 13 have identical coordinate values. Having two distinct sites with the same spatial coordinates is not allowed in spOccupancy. This also explains why you were able to get tPGOcc() to work, as that function doesn't require the spatial coordinates. Assuming that sites 12 and 13 do have the same coordinates and that is not a data entry/wrangling error, there are two potential solutions:

  1. Drop one of the sites (site 12 or 13) from all objects in spT (the data list) and fit the model.
  2. Change the spatial coordinates of either site 12 or 13 by a very small amount to yield distinct, but very close, spatial coordinates

I tried both solutions with your data and the model runs without a problem for both of them. Let me know if you still run into any issues.

Kind regards,

Jeff

abfleishman commented 2 years ago

Wow! @doserjef it is amazing how many times I have looked at the sensor deployment info and overlooked that data entry mistake! "Sites" 12 and 13 are the same site, but had different sensors deployed (one sensor failed and was replaced), and the site identifier was incorrectly entered as a new site. I fixed that issue, reran the data prep code, and the model ran smoothly (and quickly!)! I am super excited to compare the results from spOccupancy to unmarked!

Thank you for your help spotting my data error!

doserjef commented 2 years ago

Awesome! Glad it's working for you now. Feel free to reach out again if you run into any issues.

JASzyma commented 8 months ago

Hello Jeff, I too am getting the same error but with tPGOocc(). I get the error at times but not always despite no changes the code. Any thoughts of what could be going on?

doserjef commented 8 months ago

Hi @JASzyma,

That error can arise for a variety of reasons, with the most likely being something related to how the data are formatted. If you email me your code and data (doserjef@msu.edu) then I can take a look to try and see what's causing it.

Jeff

JASzyma commented 8 months ago

Hello Jeff,

Thank you for your prompt response. I think you are correct...the root of my error is related to data formatting.

FYI, I am modeling (attempting) single species (a bumble bee) across multiple seasons. My sites are 10 km x 10 km grids, primary time period are decades, and replicates are years. I found your vignettes for a multiple season model but the data are already formatted so I am relying on your single season data formatting vignette to format the data. I had to do some trial and error and adapt it to my different (unique) structure (decades and years). Now that I know it is likely a data formatting issue, I am going to revisit (again!) your descriptions on formatting data.

I ran into issues when I started writing the stars code, so I expect the error is in how my spatial data are formatted. Once I struggle a bit more, I will reach out if I can't figure it out.

Thanks much!!

Jennifer

“We are committed to using sound science in decision-making and to providing the American public with information of the highest quality possible.” Ihttps://www.fws.gov/program/information-qualitynformation Quality | U.S. Fish & Wildlife Service (fws.gov)https://www.fws.gov/program/information-quality

Jennifer Szymanski (she/her)

Branch of SSA Science Support

Division of Endangered Species

U.S. Fish and Wildlife Service

Remotely located at:

USFWS – Midwest Fisheries Center

555 Lester Ave

Onalaska, WI 54650

608-799-3899

@.**@.>


From: Jeff Doser @.> Sent: Wednesday, February 28, 2024 3:27 PM To: doserjef/spOccupancy @.> Cc: Szymanski, Jennifer @.>; Mention @.> Subject: [EXTERNAL] Re: [doserjef/spOccupancy] stPGOcc: c++ Error (Issue #12)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Hi @JASzymahttps://github.com/JASzyma,

That error can arise for a variety of reasons, with the most likely being something related to how the data are formatted. If you email me your code and data @.**@.>) then I can take a look to try and see what's causing it.

Jeff

— Reply to this email directly, view it on GitHubhttps://github.com/doserjef/spOccupancy/issues/12#issuecomment-1969949096, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGRFCSKHPWIR45E2BB3CNM3YV6OMFAVCNFSM57JOXNF2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJWHE4TIOJQHE3A. You are receiving this because you were mentioned.Message ID: @.***>