Open steffilazerte opened 4 years ago
Thanks for the extra details.
It IS a database problem. There are a subset of PFW records missing species_id in bmde_data. I'll look into it.
Ha. That's good to know as I was going to report confusion....if it is an API problem. (WT Heck?)
yup. looks like PFW changed the species codes they were using which broke the join to the lk_species table during imports to BMDE.
Denis, the PFW codes are closest to the EBIRD1.05 authority, but not exact as there are two codes missing; rustow and amegol.
I'm undecided if we should create a new PFW authority or just use EBIRD1.05. For now I've elected to do user EBIRD1.05 and add the two missing codes. You can let me know if you prefer the other option.
I'm re-importing the PFW data with species_ids now. It will take a little while, but you should see it all there through the R package by tomorrow.
Should this be ready now? I'm actually getting no matches at all...
nc_data_dl(collection = "PFW", info = "testing bug",
username = "steffilazerte")
Using filters: collections (PFW); fields_set (BMDE2.00-min)
Collecting available records...
Error: These collections have no data that match these filters
In fact, replicating the request I made at the start of this issue, with the same request id, gave the confusing results of claiming to download 1299 records, but returning none:
> nc_data_dl(request_id = 156239, fields_set = "extended", username = "steffilazerte") %>%
dplyr::select(species_id, SpeciesCode, ObservationDate) %>%
head()
Using filters: collections (PFW); fields_set (BMDE2.00-ext)
Collecting available records...
collection nrecords
1 PFW 1299
Total records: 1,299
Downloading records for each collection:
PFW
Records 1 to 1299 / 1299
[1] species_id SpeciesCode ObservationDate
<0 rows> (or 0-length row.names)
Not yet, its been taking longer than I thought.
Now I'm getting data, but unfortunately the species_id
s are still missing:
nc_data_dl(request_id = 156239, fields_set = "extended", username = "steffilazerte") %>%
dplyr::select(species_id, SpeciesCode, ObservationDate) %>%
head()
# Using filters: collections (PFW); fields_set (BMDE2.00-ext)
# Collecting available records...
# collection nrecords
# 1 PFW 1299
# Total records: 1,299
# Downloading records for each collection:
# PFW
# Records 1 to 1299 / 1299
# species_id SpeciesCode ObservationDate
# 1 NA rebnut Jan 1 2017 12:00AM
# 2 NA whbnut Jan 1 2017 12:00AM
# 3 NA dowwoo Jan 1 2017 12:00AM
# 4 NA haiwoo Jan 1 2017 12:00AM
# 5 NA whbnut Jan 1 2017 12:00AM
# 6 NA bkcchi Jan 1 2017 12:00AM
Just cleaning up some messages, but the latest eBird codes here in this table: lk_species_ebird_taxon
I strongly suspect the old PFW data will need to be matched against the older codes, and the newer data with the new codes, unless we ask Cornell for a complete set again.
Unless this remains an unresolved issue, I would probably wait until next summer when they will be expected to send us the last year of data.
It might be worth checking if the codes you used match species_status = 1 in the lookup table (full species). I suspect some id’s have been deprecated and we should avoid using those (species_id = 0, particularly, but you may also have ID’s that refer to splits to lumped species).
A user reported a problem with downloading PFW data via naturecounts. In the process of dealing with this problem @steffilazerte noticed that
species_id
were missing from the PFW data:In contrast, other collections do return
species_id
, for example:@cjardine-bsc asked @pmorrill for a link between API and database tables and used that to tracked it down to the API: "That suggests that the problem ... is rooted in the API not the DB (if it's pulling from bscakn.bmde_data directly)."
species_id
should be non-NA unless there is no observation, as it's an important field.