AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

Confusing column ID name in `search_all(lists)` #225

Closed daxkellie closed 4 months ago

daxkellie commented 5 months ago

It's quite confusing that although search_all(lists) returns dataResourceUid as the column name containing the unique list ID, but this column can't be used to filter a species list correctly with atlas_species().

library(galah)
galah_config(email = "your-email-here")

# List id under "dataResourceUid"
search_all(lists, "New South Wales")
#> # A tibble: 2 × 19
#>   dataResourceUid listName         listType dateCreated lastUpdated lastUploaded
#>   <chr>           <chr>            <chr>    <chr>       <chr>       <chr>       
#> 1 dr650           New South Wales… CONSERV… 2015-04-04… 2023-06-07… 2023-06-07T…
#> 2 dr487           New South Wales… SENSITI… 2013-06-20… 2023-08-30… 2023-08-30T…
#> # ℹ 13 more variables: lastMatched <chr>, username <chr>, itemCount <int>,
#> #   region <chr>, isAuthoritative <lgl>, isInvasive <lgl>, isThreatened <lgl>,
#> #   wkt <chr>, category <chr>, generalisation <chr>, authority <chr>,
#> #   sdsType <chr>, looseSearch <lgl>

# dataResourceUid
# doesn't work correctly
galah_call() |>
  filter(year == 2023,
         cl22 == "New South Wales",
         dataResourceUid == "dr650") |>
  atlas_species()
#> # A tibble: 0 × 11
#> # ℹ 11 variables: Species <chr>, Species Name <chr>,
#> #   Scientific Name Authorship <chr>, Taxon Rank <chr>, Kingdom <chr>,
#> #   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
#> #   Vernacular Name <chr>

Instead, the field species_list_uid is the field used to filter by a specific list ID

# species_list_uid
# does work correctly
galah_call() |>
  filter(year == 2023,
         cl22 == "New South Wales",
         species_list_uid == "dr650") |>
  atlas_species()
#> # A tibble: 553 × 10
#>    kingdom  phylum     class      order family genus species author species_guid
#>    <chr>    <chr>      <chr>      <chr> <chr>  <chr> <chr>   <chr>  <chr>       
#>  1 Plantae  Charophyta Equisetop… Alis… Junca… Maun… Maundi… F.Mue… https://id.…
#>  2 Animalia Chordata   Mammalia   Dipr… Phasc… Phas… Phasco… (Gold… https://bio…
#>  3 Plantae  Charophyta Equisetop… Eric… Erica… Leuc… Leucop… Maide… https://id.…
#>  4 Animalia Chordata   Aves       Acci… Accip… Hali… Haliae… (Gmel… https://bio…
#>  5 Plantae  Charophyta Equisetop… Poal… Cyper… Eleo… Eleoch… Nees   https://id.…
#>  6 Plantae  Charophyta Equisetop… Sapi… Rutac… Pheb… Phebal… P.H.W… https://id.…
#>  7 Animalia Chordata   Aves       Char… Haema… Haem… Haemat… Vieil… https://bio…
#>  8 Plantae  Charophyta Equisetop… Poal… Poace… Arth… Arthra… (Thun… https://id.…
#>  9 Plantae  Charophyta Equisetop… Sapi… Rutac… Zier… Zieria… R.Br.… https://id.…
#> 10 Animalia Chordata   Aves       Stri… Strig… Ninox Ninox … (Goul… https://bio…
#> # ℹ 543 more rows
#> # ℹ 1 more variable: vernacular_name <chr>

I think it would be more transparent to keep id names the same, even if this isn't the true column name returned by the API containing list IDs. Otherwise, I'm not sure how users would figure this out themselves

mjwestgate commented 4 months ago

Implemented in v. 2.0.1