Closed daxkellie closed 2 months ago
This seems to relate to issue #146
This is actually weirder than I originally thought. Looks like it does work sometimes, but not all the time. You can't specify a taxa and add an assertion filter. So I'm not certain whether the issue is with galah_filter()
or somewhere else
library(galah)
#>
#> Attaching package: 'galah'
#> The following object is masked from 'package:stats':
#>
#> filter
galah_call() |>
galah_filter(assertions != "INVALID_SCIENTIFIC_NAME") |>
galah_group_by(family) |>
atlas_counts()
#> # A tibble: 6,563 × 2
#> family count
#> <chr> <int>
#> 1 Meliphagidae 8475943
#> 2 Artamidae 4467487
#> 3 Psittacidae 4263647
#> 4 Anatidae 3944557
#> 5 Columbidae 3235593
#> 6 Acanthizidae 3199128
#> 7 Cacatuidae 3035366
#> 8 Poaceae 2537315
#> 9 Rhipiduridae 2304547
#> 10 Fabaceae 2245147
#> # ℹ 6,553 more rows
galah_call() |>
galah_identify("psittaciformes") |>
galah_filter(assertions != "INVALID_SCIENTIFIC_NAME") |>
galah_group_by(family) |>
atlas_counts()
#> # A tibble: 1 × 1
#> count
#> <dbl>
#> 1 0
galah_call() |>
galah_filter(order == "psittaciformes",
assertions != "INVALID_SCIENTIFIC_NAME") |>
galah_group_by(family) |>
atlas_counts()
#> # A tibble: 1 × 1
#> count
#> <dbl>
#> 1 0
Created on 2023-06-29 with reprex v2.0.2
I love {galah} and would love to see this feature implemented!
I had some old code where this galah_filter(assertions != "INVALID_SCIENTIFIC_NAME")
was possible and and even galah_filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID"))
Just wanted to upvote this one :D
As of version 2.0.2, this feature works experimentally when querying the ALA. There is still some nuance to work out because the API doesn't consistently handle all assertions the exact same way, but it's a start!
library(galah)
#> galah: version 2.0.2
#> ℹ Default node set to ALA (ala.org.au).
#> ℹ See all supported GBIF nodes with `show_all(atlases)`.
#> ℹ To change nodes, use e.g. `galah_config(atlas = "GBIF")`.
#> Attaching package: 'galah'
#>
#> The following object is masked from 'package:stats':
#>
#> filter
galah_call() |>
identify("psittaciformes") |>
galah_filter(assertions == "INVALID_SCIENTIFIC_NAME") |>
galah_group_by(family) |>
atlas_counts()
#> # A tibble: 2 × 2
#> family count
#> <chr> <int>
#> 1 Cacatuidae 10527
#> 2 Psittacidae 8023
galah_call() |>
identify("psittaciformes") |>
galah_filter(assertions != "INVALID_SCIENTIFIC_NAME") |>
galah_group_by(family) |>
atlas_counts()
#> # A tibble: 3 × 2
#> family count
#> <chr> <int>
#> 1 Psittacidae 4362018
#> 2 Cacatuidae 3102361
#> 3 Nestoridae 94
Created on 2024-04-12 with reprex v2.0.2
At the moment, you can return records tagged with a type of assertion (i.e. a data quality check) with
galah_filter()
eg:However, you cannot filter out these records using
galah_filter(assertions != "RECORDED_DATE_INVALID")
because an assertionsolr
query would need to be built in a slightly different way to howgalah_filter()
buildssolr
queries normally.For example, this is the correct
solr
query to filter out records withINVALID_SCIENTIFIC_NAME
But this is how
galah_filter()
builds the query at the moment:This very slight difference is enough to mean these queries don't work correctly.
I think it might be possible to support filtering out assertions by checking whether the
assertions
field has been used ingalah_filter()
, which will then use a separate, bespoke method to build the correct assertionssolr
queries?