AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
39 stars 3 forks source link

Retrieving a taxon by taxonConceptID returns subtaxa without my asking #139

Closed DesiQuintans closed 1 year ago

DesiQuintans commented 2 years ago

Describe the bug

I am trying to download images for just a particular taxon, but galah also gives me its subtaxa, which I'm not interested in. This could potentially be a big deal since it could inflate my search by 3x or more, making me download thousands of extra images.

galah version

1.4.0

To Reproduce

library(galah)
library(magrittr)

galah_identify("Boronia deanei")

#> # A tibble: 1 x 1
#>   identifier                                      
#>   <chr>                                           
#> 1 https://id.biodiversity.org.au/node/apni/2909142

galah_call() %>% 
    galah_identify("https://id.biodiversity.org.au/node/apni/2909142", search = FALSE) %>% 
    galah_group_by(taxonConceptID, taxonRank, species) %>% 
    atlas_counts() 

#> # A tibble: 3 x 4
#>   species        taxonRank  taxonConceptID                                 count
#>   <chr>          <chr>      <chr>                                          <int>
#> 1 Boronia deanei species    https://id.biodiversity.org.au/node/apni/2909~  1856
#> 2 Boronia deanei subspecies https://id.biodiversity.org.au/node/apni/2888~    52
#> 3 Boronia deanei subspecies https://id.biodiversity.org.au/node/apni/2918~    35

Created on 2022-03-24 by the reprex package (v2.0.1)

Expected behaviour

I expect that if I provide a specific taxonConceptID, then only that ID will be selected. I expect that searching recursively into subtaxa should be opted-into with a function argument because it is a potentially expensive operation.

daxkellie commented 2 years ago

Thanks for getting in contact about your issue. We are happy to hear you are using {galah} and appreciate you letting us know about where you are encountering a problem.

The reason you are getting counts of both species and subspecies appears to result from how galah_group_by() is interpreting the request to group by taxonRank. Because Boronia deanei has several subspecies that are members of the same species rank, galah_group_by() is choosing to include them in the groupings (even though you were explicit about the taxonConceptID in galah_identify()). There are good reasons for why this is the correct behaviour for galah_group_by(), but it seems we might need to make it so that when people specify a taxonConceptID this overrides galah_group_by's behaviour. This is something we can fix in the next update!

The good news is that by using galah_filter() and specifying the taxonRank, we can fix your issue of downloading more records than you wanted

galah_call() %>% 
  galah_identify("https://id.biodiversity.org.au/node/apni/2909142", search = FALSE) %>%
  galah_filter(taxonRank == "species") %>%
  galah_group_by(taxonConceptID, taxonRank, species) %>% 
  atlas_counts() 

#> # A tibble: 1 x 4
#>   species        taxonRank taxonConceptID                                  count
#>   <chr>          <chr>     <chr>                                           <int>
#> 1 Boronia deanei species   https://id.biodiversity.org.au/node/apni/29091~  1856

Created on 2022-03-25 by the reprex package (v2.0.1)

DesiQuintans commented 2 years ago

The good news is that by using galah_filter() and specifying the taxonRank, we can fix your issue of downloading more records than you wanted

Thank you, this is what I ended up figuring out last night!

Not to dump more requests into the same Issue, but I think a fileSize filter or an imageWidth/Height filter would be fantastic for uses like mine.

daxkellie commented 2 years ago

Good to hear! I've added your suggestion to a new issue #140