BirdsCanada / NatureCountsAPI

NatureCountsAPI
0 stars 1 forks source link

Avian Species Codes metadata #15

Closed steffilazerte closed 5 years ago

steffilazerte commented 5 years ago

@denislepage Internally for the R package, I collect all the avian species codes for each authority, and combine them into one table.

I also make species_id2 more informative by replace all NA values with species_id values (this is the approach you took in your R scripts).

The resulting data works well for searching by alphanumeric code with the species_code_search() function.

  species_id rank species_id2 ABATLAS2 ATOWLS  BBL  BBS  BBS2 BCATLAS1 BCMA BSCCHECK BSCDATA  CBC CMMN EBIRD1.05 EPOQ  IBA ...
1        440    1         440     ABDU   <NA> ABDU ABDU 01330     ABDU ABDU      535    ABDU 5579 ABDU    ambduc  240 ABDU ...
2      12140    1       12140     ACFL   <NA> ACFL ACFL 04650     ACFL ACFL     4354    ACFL 6774 ACFL    acafly 7107 ACFL ...
3       4100    1        4100     AGPL   <NA> AMGP AMGP 02720     AGPL AGPL     1530    AMGP 4922 AMGP    amgplo 2450 AMGP ...
4      12160    1       12160     ALFL   <NA> ALFL ALFL 04661     ALFL ALFL     4355    ALFL 6775 ALFL    aldfly 7109 ALFL ...
5       4300    1        4300     AMAV   <NA> AMAV AMAV 02250     AMAV AMAV     1617    AMAV 1022 AMAV    ameavo 2560 AMAV ...
...
...

1) Is it okay that I've replace missing species_id2 values with species_id values? Is this something better done on the server itself?

  1. Would it be acceptable to have this single table for species metadata in the SQLite databases? Otherwise we'll add 31 tables (one for each of the species authorities)

  2. If so, does it matter that there can be no primary key? (i.e. there is no unique column here).

denislepage commented 5 years ago
  1. species_id2 refers to subspecies. Whether they have a value for species that matches species_id is about the same as leaving them null. Either way, you'd have to adapt your code based on what you are trying to do. If you find it easier to change them to match species_id, that's fine.
authority   species_id  species_id2 species_code    rank
BSCDATA 16610   NULL    YRWA    1
BSCDATA 16610   16620   MYWA    2
BSCDATA 16610   16630   AUWA    3
  1. I am not convinced the code table needs to be transposed (changed from rows to columns). We certainly should not have 31 tables, but why not keep the original format provided by the API? I'd need to understand better what you are trying to do. If needed, you could always set up a utility function that can create a subset for a specific authority, but I would prefer to keep the original table with the authority column.

Worth noting: the primary key on species_codes is made of authority + species_code. In each authority, a alpha code is unique, but you can have multiple codes referring to the same species_id, as long as they have different ranks. When going from species_id to species_codes, you'd assume that the code with rank = 1 is the one to use.

steffilazerte commented 5 years ago

That's totally my oversight, when I call in the species codes from the API, I have been asking individually for each authority, I didn't realize that authority was an optional argument, and that by omitting it I'd get the whole table. I'll pull in the whole table as is, and use the transposed table for internal search functions.

Actually, never mind, I don't need the transposed table at all :)