AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

Should there be a way to only apply a subset of the attributes in a profile? #146

Open shandiya opened 2 years ago

shandiya commented 2 years ago

Currently, there's no way to select the attributes of a profile within a call to galah_filter(). For instance, if I apply the ALA general profile, there's no way to also get absence records since the profile excludes these. It would be nice to be able to modify the attributes included in these profiles.

mjwestgate commented 2 years ago

This is a good idea. Looking at the biocache, it appears that this is possible. For example, a biocache search for Litoria (available here) simply adds &qualityProfile=ALA to the URL to apply the default profile (galah does this already). BUT you can tailor this using &disableQualityFilter. The relevant query to support absences - while still leaving the rest of the filters 'on' - would be &disableQualityFilter=occurrence-status.

One problem with this is that search_profile_attributes() doesn't currently return the field names needed to make this query work (occurrence-status in this case). Fortunately, however, this is easily fixed as this information is returned via the API. We could support this kind of functionality within the new galah_data_profile() function, currently on the development branch of galah (#130).

The final question is how this should appear to the user. Setting up NSE is pretty easy. Without checking I can't be sure, but something like this might work:

galah_call() |>
  galah_filter(year == 2022) |>
  galah_data_profile(ALA, -occurrence-status) |>
  atlas_counts()

Is that a good solution? Or is it a bit messy?

daxkellie commented 1 year ago

I think that this solution is pretty tidy as long as users find it intuitive to think of data profile filters like columns (e.g. dplyr::select(-col1))