Open sofbol94 opened 3 years ago
You can't use different versions of the EBD and sampling event data. You have a Mar-2020 EBD and an Aug-2020 sampling event data. I understand that since you have sensitive data you probably can't get an Aug-2020 version. It is possible to combine these, but you'll have to do it manually. I'd start by using auk to subset the sampling event data:
library(auk)
library(tidyverse)
f_sed <- "~/data/sed_GGM.txt"
sed_filter <- auk_sampling("ebd_sampling_relAug-2020.txt") %>%
auk_country("Costa Rica") %>%
auk_date(c("2019-01-01", "2019-03-31")) %>%
auk_complete() %>%
auk_filter(f_sed, overwrite=TRUE)
Then read in the EBD directly, no need to subset it first since it's a small file, and subset both the EBD and SED to have the same set of checklists.
sed <- read_sampling(f_sed, unique = FALSE)
ebd <- read_ebd("ebd_sensitive_relMar-2020.txt", unique = FALSE)
ids <- intersect(sed$checklist_id, ebd$checklist_id)
sed <- filter(sed, checklist_id %in% ids)
ebd <- filter(ebd, checklist_id %in% ids)
zf <- auk_zerofill(ebd, sed, collapse = TRUE)
I don't have time to actually test any of this, so you may need to try it out and adjust the code, but this should get you started.
Thanks, that was helpful, i'm having some issues though with the second part.
sed <- read_sampling(f_sed)
ebd <- read_ebd("ebd_sensitive_relMar-2020.txt")
ids <- intersect(sed$checklist_id, ebd$checklist_id)
sed <- filter(sed, checklist_id %in% ids)
ebd <- filter(ebd, checklist_id %in% ids)
zf <- auk_zerofill(ebd, sed, collapse = TRUE)
i took away unique=FALSE otherwise i had no column called checklist_id but when i write the command to intersect the file i have no absence and the zf has only checklist were the species was recorded. any suggestion?
thanks again, sofia
Hmmm, as I think about this more, I don't think you can correctly zero fill the data without the matching sampling event data. I think you'll need to request the most recent version of the Great Green Macaw data so it will match the sampling event data.
I wanted to follow up on this issue as I'm having a similar problem with auk_zerofill giving the error: "Some checklists in EBD are missing from sampling event data."
In my case I have ensured that the versions of the EBD and sampling event data match (both are Jan-2021). However, I am using a custom downloaded EBD dataset (all observations in Canada) and the full sampling event data. Based on a previous issue (now closed -- see here) I'm wondering if a mismatch between a custom dataset is the underlying issue? Unfortunately it seems the only way to check this would be to download the complete EBD and at 90GB I'll admit to be being a bit reticent.
I read in both of the successfully filtered EBD and sampling event files (via read_ebd and ebd_sampling, respectively) and they definitely reveal a different number of records (2864 vs. 2052 for my particular filters -- a bounding box in Alberta). So that is probably the issue. But when I try out the suggestion from @mstrimas to manually subset I end up with 454 common checklist_id observations.
This is my first project looking at the eBird data, so maybe I'm missing something here, but it seems there is something strange and maybe zero-filled data REQUIRES the full datasets?
Hi @gking-aug just wondering if you ever found a solution for your problem?
I am having an almost identical issue to you, and am having troubleshooting the issue myself.
Dd you end up needing to download the full EBD dataset? Or did you find a way to match up the custom download ebd & sampling event files for zerofilling?
Thanks!
Hi @BrittanyHBrown. This is a really good question -- the project was a directed reading and I haven't touched it in a while. Let me quickly investigate what I ended up doing and I will follow-up and post here.
Building off the initial question in this thread, I am also new to auk and getting the same error. In my case, I am trying to use auk_zerofill for multiple datasets independently. My code is working for all except one dataset, even though from what I can tell it's exactly the same. I have ensured that all the months that the data covers is consistent and that all species are reported. Here is my code:
`#My code works for 2019 (in addition to 5 other years of data) US2019sed <- "Acadian Flycatcher/US_2019/ebd_US_acafly_201905_201908_smp_relMay-2024_sampling.txt" US2019check <- read_sampling(US2019sed) US2019ebd <- "Acadian Flycatcher/US_2019/ebd_US_acafly_201905_201908_smp_relMay-2024.txt" US2019obs <- read_sampling(US2019ebd)
US2019checksub <- subset(US2019check, all_species_reported == TRUE) US2019obssub <- subset(US2019obs, all_species_reported == TRUE)
zfUS19 <- auk_zerofill(US2019obssub, US2019checksub, collapse = TRUE)
US2020sed <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024_sampling.txt" US2020check <- read_sampling(US2020sed) US2020ebd <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024.txt" US2020obs <- read_sampling(US2020ebd)
US2020checksub <- subset(US2020check, all_species_reported == TRUE) US2020obssub <- subset(US2020obs, all_species_reported == TRUE)
zfUS20 <- auk_zerofill(US2020obssub, US2020checksub, collapse = TRUE)`
If anyone has any ideas of what might be going on, I'd really appreciate some feedback! I tried re-downloading the 2020 dataset a couple times now in case there was something wrong with the download, but get the same error.
First, you should be using read_ebd()
to read in the observation data, so these lines:
US2019obs <- read_sampling(US2019ebd)
US2020obs <- read_sampling(US2020ebd)
Should be changed to
US2019obs <- read_ebd(US2019ebd)
US2020obs <- read_ebd(US2020ebd)
If you're still having problems after making that change, please post the error and we can try to troubleshoot it. Thanks!
Thanks for the catch on the read_ebd @mstrimas. I updated that portion of my code and am still getting the same error.
US2020sed <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024_sampling.txt"
US2020check <- read_sampling(US2020sed)
US2020ebd <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024.txt"
US2020obs <- read_ebd(US2020ebd)
US2020checksub <- subset(US2020check, all_species_reported == TRUE)
US2020obssub <- subset(US2020obs, all_species_reported == TRUE)
zfUS20 <- auk_zerofill(US2020obssub, US2020checksub, collapse = TRUE)
Error in auk_zerofill.data.frame(US2020obssub, US2020checksub, collapse = TRUE) :
Some checklists in EBD are missing from sampling event data.
I am stumped because the same code is working on other datasets. Thanks!
This is a rare bug that I've describe here https://github.com/CornellLabofOrnithology/auk/issues/79#issuecomment-1934555208
In your case, right before you call auk_zerofill()
, add something like the following:
US2020obssub <- US2020obssub[US2020obssub$checklist_id %in% US2020checksub%checklist_id, ]
Hello,
I'm new to auk, and working with data for Great Green Macaws to estimate presence/absence in different seasons. I've filtered my ebd and sampling event data to Costa Rica and then attempted to zero fill these. Since i'm working with a sensitive species i'm using a customized EBD. However, I am getting an error that there are some checklists in the EBD that are missing in the sampling data. i tryied to filter for the last edited date to exclude checklist that were added after Mar2020. Here is my code:
Wondering if anyone has any insight into why this may be the case, and how I could solve this considering that i can't download a more recent custumized EBD file.
thanks, Sofia