Closed kdlafferty closed 2 years ago
Thanks for posting. This is tricky, I've tried on a few machines and I'm not able to reproduce the error that you've shown here. After you get the error, could you share your session info by running the code sessionInfo()
in R and then pasting the output here? That should help diagnose the problem a bit further.
Sorry about the trouble with this! We'll get to the bottom of it.
Hi Jeff, Thanks for getting back to me and for your willingness to help an old man out. Since you could not recreate the problem (just like my mechanic can never seem to recreate my car troubles) I thought I would restart R and reload the files. Problem gone.
Perhaps I had other defined functions and variables that were somehow messing with yours. In any case, if it happens again, I will run sessionInfo() to see if I can get clues as to what the interference was. And so I think you can close the comment thread on Github by noting the problem went away after a Restart.
I want to reiterate how excited I am about this particular package. Your paper listing the shortcomings of existing packages really rang true for me. And 1 year of trying to do solve those limitations using Rstan nearly turned me into a shriveled carcass. And although I could create MSOCCs in RSTAN I would never have come close the the elegant solutions that your package has (NNGP!!). I had hoped that someone was working on something that was as powerful and easy to use as this. Let’s just say I have high expectations!
If you are interested, I am a senior scientist with the USGS. I hope your package will help me with a number of things, some of which might be a bit non-standard. So I thought I would share what I am trying to apply it to.
1) Estimate missing species from camera trap data. I think this will be a pretty straightforward application.
2) Estimate likely but unobserved marine species distributions from GBIF_OBIS data. Here, I am trying to bend non survey data into survey data (by defining a “survey” as date-location-records that include > 1 species). Akin to using full checklists in eBird... Eventually, I will want to use multiple data sets when you develop the package further (keep us posted). Your warbler maps are an inspiration.
3) Environmental DNA “occupancy” for metabarcoding. Here, a site is a location in time and space and a replicate is a water sample. For a given site and given set of replicates what taxa are likely false negatives? I’m pretty sure the multi-taxon approach and the spatial coordinates will come in handy. The challenge here is the extent that we can add a lower hierarchy. My colleague Ritchie Erickson has done some coding for the single species non-spatial case. I turned him on to your package.
4) Estimate host-parasite networks from sampling data. Here hosts are the sites, and parasites are the species and a multi-species approach is key to success, as is being able to have random effects at the host (site) level, like taxon. E.g., for known bat-viral networks, which links have we likely missed.
I’ll likely be curious about how to include: 1) time as season effects as autocorrelated occupancy covariates. 2) false-negative output in the predict function (if it is not already there). 3) lower nested hierarchy for subsamples. E.g., for eDNA we often have PCR replicates for each sample replicate. 4) Considering whether a point lies within a convex hull, of occupancies, which seems to be as good a spatial predictor as proximity. 5) The ability to estimate unrecorded species (and do a better job than Dorazio did). I have some preliminary approaches that seem to work OK.
Of course, you could save me the trouble by doing all this as a post doc in sunny Southern California. Just need to get some funds together.
And, I may impose on your generosity again as I start to get into things a bit and reach new barriers. I expect the best way to improve a package is through the suffering of your users.
Kevin
On Jan 8, 2022, at 12:57 PM, Jeff Doser @.***> wrote:
Thanks for posting. This is tricky, I've tried on a few machines and I'm not able to reproduce the error that you've shown here. After you get the error, could you share your session info by running the code sessionInfo() in R and then pasting the output here? That should help diagnose the problem a bit further.
Sorry about the trouble with this! We'll get to the bottom of it.
— Reply to this email directly, view it on GitHub https://github.com/doserjef/spOccupancy/issues/3#issuecomment-1008151487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXEDCSFKYGJQMZSUFWNAOX3UVCQNHANCNFSM5LPXBK3Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
Hi Kevin,
Great, glad to hear the error went away for you! I always love it when the classic restart and reload solves a problem :) I agree with you that it may have been caused by some conflicting functions or variables that were defined, perhaps from other packages that were loaded. I'll go ahead and resolve the Github issue, but please do let me know if you encounter it again.
Thanks very much for the kind words about the package! That's very cool to hear about the many interesting applications you're thinking about applying it to. Here's a few things I'll mention to each of them:
The MSOMs in the package aren't currently implemented with the Dorazio and Royle data augmentation approach, although that is something I'm thinking about putting in for future implementations (it's a bit tricky as the additional data augmentation parameter messes with some of the computational algorithms we use). Although there are perhaps some simpler ways to do this if the number of missing species is known (and I'm curious to hear about the preliminary approaches you've mentioned).
Sounds very cool. spOccupancy can currently integrate multiple data sources for single species models, and I should have it implemented for using multiple data sources for multispecies models within the next few months or so. Any updates will be posted on the package website.
Unfortunately, spOccupancy currently doesn't have functionality to fit the multiscale models with the additional hierarchy in eDNA studies. However, this is of course another area for future development, as there are certainly many cool applications of the basic multiscale occupancy model. It would certainly be interesting to combine the single species approach used by your colleague in with the spatial approaches implemented in spOccupancy.
This sounds really fascinating!
I've got some postdoc plans set up for the next couple of years, but of course don't hesitate to reach out as you dig deeper into these studies and encounter problems, I'm always happy to help out. As you mentioned, it is really helpful for me to see where users encounter problems or want new functionality, so feel free to post any code problems you encounter on Github or you can also email me directly (doserjef@msu.edu) with more specific problems as you start digging deeper into these projects.
Kindest regards,
Jeff
Thanks for the feedback. I can’t seem to get the Dorazio and Royle data augmentation approach to work well with simulated data. And this is I think because the species that are undetected tend to be low prevalence species at poorly sampled sites. One alternative is to apply a Chao estimator approach to the posteriors. But this is not particularly good at matching simulated data. Without providing sufficient explanation, I have had some good success with a geometric distribution approach applied to the posteriors. From the posteriors, you can estimate the probability that a species would have been entirely missed (e.g., across all sites) in the sampling effort. If species A was detected, but there was an a priori expected 50% chance that it would not have been detected, then you might assume that there is a sister species B that is similar to A but was not detected. Doing this might allow you to estimate the total number of missing species AND perhaps some of their attributes. It might also be possible to apply similar logic to the data augmentation approach if you add a shape parameter about the distribution of p. In any case, if I can get your package to work with my data, I will return to this problem and let you know how it goes.
On Jan 9, 2022, at 4:34 AM, Jeff Doser @.***> wrote:
Hi Kevin,
Great, glad to hear the error went away for you! I always love it when the classic restart and reload solves a problem :) I agree with you that it may have been caused by some conflicting functions or variables that were defined, perhaps from other packages that were loaded. I'll go ahead and resolve the Github issue, but please do let me know if you encounter it again.
Thanks very much for the kind words about the package! That's very cool to hear about the many interesting applications you're thinking about applying it to. Here's a few things I'll mention to each of them:
The MSOMs in the package aren't currently implemented with the Dorazio and Royle data augmentation approach, although that is something I'm thinking about putting in for future implementations (it's a bit tricky as the additional data augmentation parameter messes with some of the computational algorithms we use). Although there are perhaps some simpler ways to do this if the number of missing species is known (and I'm curious to hear about the preliminary approaches you've mentioned).
Sounds very cool. spOccupancy can currently integrate multiple data sources for single species models, and I should have it implemented for using multiple data sources for multispecies models within the next few months or so. Any updates will be posted on the package website.
Unfortunately, spOccupancy currently doesn't have functionality to fit the multiscale models with the additional hierarchy in eDNA studies. However, this is of course another area for future development, as there are certainly many cool applications of the basic multiscale occupancy model. It would certainly be interesting to combine the single species approach used by your colleague in with the spatial approaches implemented in spOccupancy.
This sounds really fascinating!
I've got some postdoc plans set up for the next couple of years, but of course don't hesitate to reach out as you dig deeper into these studies and encounter problems, I'm always happy to help out. As you mentioned, it is really helpful for me to see where users encounter problems or want new functionality, so feel free to post any code problems you encounter on Github or you can also email me directly @. @.>) with more specific problems as you start digging deeper into these projects.
Kindest regards,
Jeff
— Reply to this email directly, view it on GitHub https://github.com/doserjef/spOccupancy/issues/3#issuecomment-1008289321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXEDCSGLG6PWVZGB2EVHE4DUVF6EXANCNFSM5LPXBK3Q. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
Been waiting for a package like this. Thanks. Hoping it will be a replacement for my Rstan Code. But I hit walls when running your examples, which is not a great sign.
In the guide, pg 65, example for spPGOcc returns: Error in na.fail.default(list(det.cov.1 = c(-1.3694286738296, 0.723596325316646, : missing values in object
Running the btbwHBEF vignette has a similar error: Error in na.fail.default(list(
scale(day)
= c(-0.922875817115221, -0.922875817115221, : missing values in objectppcOcc example also.
These are copy pasted from the documentation and I believe all libraries and packages are loaded and I did not see reference to this error in the supporting documentation.
I presume the errors refers to NAs in the detection covariate list. Note that getting rid of NAs solves the problem (e.g., in the ppcOcc example this edit results in the model working fine (change "n.rep <- sample(2:4, J, replace = TRUE)" to "n.rep <- rep(2,J))", but being able to have NAs is your way of allowing variation in visit number, which is key for my needs. So an na.exclude won't do. Assuming there is something simple I am missing, but others might have similar results.