biodiverse / ubms

Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.
https://hmecology.github.io/ubms/
GNU General Public License v3.0
35 stars 8 forks source link

ubms estimates do not compare to unmarked estimates #37

Closed ManuelSpinola closed 3 years ago

ManuelSpinola commented 3 years ago

Hi Ken,

I am trying to run ubms model to my data but because I did not obtain similar estimates to the models with unmarked, I tested both with example data from unmarked.

The results with ubms sometimes do not have a good fit and sometimes the R-hat is not appropriate.

data(crossbill)

site_covs <- crossbill[,c("id", "ele", "forest")]

y <- crossbill[,c("det991","det992","det993")]

date <- crossbill[,c("date991","date992","date993")]

umf <- unmarkedFrameOccu(y=y, siteCovs=site_covs, obsCovs=list(date=date))

stan_global <- stan_occu(~scale(date)~scale(forest)+scale(ele), data=umf, chains=4)

um_global <- occu(~scale(date)~scale(forest)+scale(ele), data=umf)

cbind(unmarked=coef(um_global), stan=coef(stan_global))

psi(Int) -0.7434025 1.4468469 psi(scale(forest)) 0.9782374 2.4063905 psi(scale(ele)) 0.5898850 1.4289875 p(Int) -0.6738667 -0.9996905 p(scale(date)) 0.5505878 0.5685348

Any suggestion on how to reach similar results?

kenkellner commented 3 years ago

Hi Manuel,

You're right that ubms gives returns poor results with this dataset. This particular slice of crossbill has very few detections and more importantly very few sites with multiple detections. The traceplots are a mess, with the chains constantly jumping to relatively large values or one chain getting stuck. I ran this analysis independently using Stan and JAGS and got basically the same poor results. It seems like MCMC/Bayesian approaches in general just have trouble with this dataset. Note that if you use other years from crossbill the results are fine.

A general solution is to specify narrower priors. This keeps MCMC from getting stuck at unreasonably high values. When I did this I was able to get reasonable results in Stan and JAGS that were similar to unmarked. However setting custom priors is not currently possible to do in ubms, this is a priority for me in the future.

If you are consistently seeing similar problems with your own dataset, I would try running things for much longer iterations than the default, perhaps starting with 10,000 per chain (particuarly if Rhats are poor). If that doesn't help it might be that ubms is not a good choice for your dataset, at least until it is possible to adjust the priors.

Ken

ManuelSpinola commented 3 years ago

Thank you very much Ken.

I will try that.

Is there any rule of thumbs for number of detections and NAs for running occupancy models?

Manuel

El mar, 13 jul 2021 a las 8:10, Ken Kellner @.***>) escribió:

Hi Manuel,

You're right that ubms gives returns poor results with this dataset. This particular slice of crossbill has very few detections and more importantly very few sites with multiple detections. The traceplots are a mess, with the chains constantly jumping to relatively large values or one chain getting stuck. I ran this analysis independently using Stan and JAGS and got basically the same poor results. It seems like MCMC/Bayesian approaches in general just have trouble with this dataset. Note that if you use other years from crossbill the results are fine.

A general solution is to specify narrower priors. This keeps MCMC from getting stuck at unreasonably high values. When I did this I was able to get reasonable results in Stan and JAGS that were similar to unmarked. However setting custom priors is not currently possible to do in ubms, this is a priority for me in the future.

If you are consistently seeing similar problems with your own dataset, I would try running things for much longer iterations than the default, perhaps starting with 10,000 per chain (particuarly if Rhats are poor). If that doesn't help it might be that ubms is not a good choice for your dataset, at least until it is possible to adjust the priors.

Ken

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kenkellner/ubms/issues/37#issuecomment-879122249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB6NVEBMNYFJUAZSIZLTXRCMFANCNFSM5AHXUDUQ .

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

kenkellner commented 3 years ago

I don't know of a rule of thumb, but I find that if the vast majority of sites have either 0 detections or 1 detection, and only a handful of sites have >1 detection, model results area likely to be poor (especially when you have many covariates). There's just no way to get a good estimate of p with so little info. This outcome is of course more likely when you only have 2-3 surveys at each site (as with crossbill).

There's no issues with NAs specifically, except when there are so many NAs that you run into the situation above, where there are few detections.

ManuelSpinola commented 3 years ago

Thank you very much Ken.

Manuel

El mar, 13 jul 2021 a las 10:37, Ken Kellner @.***>) escribió:

I don't know of a rule of thumb, but I find that if the vast majority of sites have either 0 detections or 1 detection, and only a handful of sites have >1 detection, model results area likely to be poor (especially when you have many covariates). There's just no way to get a good estimate of p with so little info. This outcome is of course more likely when you only have 2-3 surveys at each site (as with crossbill).

There's no issues with NAs specifically, except when there are so many NAs that you run into the situation above, where there are few detections.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kenkellner/ubms/issues/37#issuecomment-879236466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB5JGDBTSFURIDMMPUDTXRTUTANCNFSM5AHXUDUQ .

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

kenkellner commented 3 years ago

I've implemented custom priors, and the new default priors for stan_occu result in more comparable estimates:

                     unmarked       stan
psi(Int)           -0.7434025 -0.6164383
psi(scale(forest))  0.9782374  1.0761340
psi(scale(ele))     0.5898850  0.5671107
p(Int)             -0.6738667 -0.7450999
p(scale(date))      0.5505878  0.5607567

Not in the CRAN version yet, but will be relatively soon.

ManuelSpinola commented 2 years ago

Thank you very much Ken.

Manuel

El lun, 27 sept 2021 a las 11:54, Ken Kellner @.***>) escribió:

I've implemented custom priors, and the new default priors for stan_occu result in more comparable estimates:

                 unmarked       stan

psi(Int) -0.7434025 -0.6164383 psi(scale(forest)) 0.9782374 1.0761340 psi(scale(ele)) 0.5898850 0.5671107 p(Int) -0.6738667 -0.7450999 p(scale(date)) 0.5505878 0.5607567

Not in the CRAN version yet, but will be relatively soon.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kenkellner/ubms/issues/37#issuecomment-928112920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FBY355M35QZGCZDXOK3UECVWTANCNFSM5AHXUDUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/