AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
39 stars 3 forks source link

Investigate options for filtering by species list membership #127

Closed mjwestgate closed 1 year ago

mjwestgate commented 2 years ago

A useful feature would be to filter counts or occurrences by species threatened status, or other some set of (potentially user-defined) species-level attributes or traits. One toolset that could support this functionality is the 'lists' tool (https://lists.ala.org.au/). Apparently some lists are designated as 'authoritative' and indexed in the biocache; but they don't appear in show_all_fields() to my knowledge. Working out how to pass something like:

galah_call() |>
  galah_filter(EPBCstatus == "endangered") |>
  atlas_counts()

...would be really helpful.

peggynewman commented 2 years ago

There was lists functionality was in ALA4R. Might be worth revisiting that.

The lists that are "authoritative" are indeed indexed in biocache. I would like to clean them up because there are a lot of historical and possibly nonsensical artefacts in there. https://lists.ala.org.au/public/speciesLists?sort=itemsCount&order=desc&isAuthoritative=eq%3Atrue&max=100&q=

mjwestgate commented 2 years ago

Thanks for this Peggy, that's really helpful! Looking at the urls for this I note that each list is identified by an id, e.g. for the EPBC list:

https://lists.ala.org.au/speciesListItem/list/dr656

...from which point we can click on 'view occurrences' to go to:

https://biocache.ala.org.au/occurrences/search?q=species_list_uid%3Adr651#tab_mapView

...which suggests that the following should work:

galah_config(run_checks = FALSE) # needed as `species_list_uid` not valid according to `show_all_fields`
galah_call() |>
  galah_filter(species_list_uid == dr656) |>
  atlas_counts()
# A tibble: 1 × 1
    count
    <int>
1 1614661

This count matches that given by the biocache. Another cool option is to subset to particular taxa, e.g. to get a list of all EPBC-listed mammals:

galah_call() |>
  galah_identify("Mammalia") |>
  galah_filter(species_list_uid == dr656) |>
  atlas_species()

I'm currently writing new search functions that would allow users to lookup list identifiers (see #132), so the only point of ambiguity is how to tell users that the field name species_list_uid works. This is similar to the problem we have with using taxonConceptId for advanced taxonomic queries, so perhaps we just need a vignette or similar.

mjwestgate commented 2 years ago

Addendum to the above; in the dev branch you can now find the list identifier using:

search_datasets("Australia wide : Conservation Status : EPBC")
# A tibble: 1 × 3
  name                                        uri                                                  uid  
  <chr>                                       <chr>                                                <chr>
1 Australia wide : Conservation Status : EPBC https://collections.ala.org.au/ws/dataResource/dr656 dr656

Although this works, search_datasets is quite messy as it incorporates species lists and occurrence datasets. A neater option might be to add show_all_lists and search_lists using the API here: https://lists.ala.org.au/ws/speciesList

mjwestgate commented 1 year ago

lists are now able to be displayed and searched as of v1.5.0