AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

DOI missing from attributes slot #182

Closed jbdorey closed 9 months ago

jbdorey commented 1 year ago

Hi there,

I'm having trouble where my script previously worked. My script is:

ColsToKeep = c("scientificName","family", "subfamily","genus","subgenus","subspecies","species" ) # This list is longer but includes ALA and non-ALA columns ALA_taxon = "Apiformes"

ALA_Occurence_download <- galah::galah_call() %>% galah::galah_identify(ALA_taxon) %>% galah::galah_select(tidyselect::any_of(ColsToKeep)) %>% galah::atlas_occurrences(mint_doi = TRUE)

attrs_ALA_Occurence_download <- attributes(ALA_Occurence_download)

However, now the doi slot is empty (attrs_ALA_Occurence_download$doi) and I'm not sure why. This then stops me from downloading the file in a later line of my function.

Let me know if you need more context!

galah version: 1.5.1

daxkellie commented 1 year ago

Thanks for raising this issue, and it looks like you caught a minor mistake on our end. In a nutshell, we made lots of changes behind the scenes to improve internal downloads in galah 1.5.1, and after we made those changes it looks like we accidentally omitted the bit of code that adds the DOI onto the download when mint_doi = TRUE (even though we still generated it along with the downloaded records!)

The latest commit to the dev branch has fixed this. If you install {galah} from the current GitHub dev branch, saving a DOI should now work!

remotes::install_github("AtlasOfLivingAustralia/galah@dev")

The DOI can be used within collect_occurrences() to download the data again - just be sure to specify that you are providing a DOI with doi = because this function still has some bugs that need fixing, and this seems to stop them from cropping up for now until we fix them all.

Here's a working example of the code your provided above:

# remotes::install_github("AtlasOfLivingAustralia/galah@dev")
library(galah)
library(dplyr)
library(tidyr)

galah_config(email = "dax.kellie@csiro.au")

ColsToKeep <- c("scientificName","family", "subfamily","genus","subgenus",
               "subspecies","species")
ALA_taxon <- "Apiformes"

ALA_Occurrence_download <- galah_call() %>%
  galah_identify(ALA_taxon) %>%
  galah_select(any_of(ColsToKeep)) %>%
  atlas_occurrences(mint_doi = TRUE)
#> This query will return 271,088 records
#> 
#> Checking queue
#> Current queue size: 1 inqueue . running ........

attributes(ALA_Occurrence_download)$doi # Returns DOI
#> [1] "https://doi.org/10.26197/ala.37450b00-40c1-4d55-8067-0f9e80d4d5f7"

# Redownloads records from DOI
collect_occurrences(doi = attributes(ALA_Occurrence_download)$doi) 
#> Downloading
#> # A tibble: 271,088 × 7
#>    scientificName family subfamily genus subgenus subspecies species
#>    <chr>          <chr>  <chr>     <chr> <chr>    <lgl>      <chr>  
#>  1 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  2 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  3 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  4 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  5 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  6 APIDAE         Apidae Apinae    <NA>  <NA>     NA         <NA>   
#>  7 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#>  8 APIDAE         Apidae Apinae    <NA>  <NA>     NA         <NA>   
#>  9 APIDAE         Apidae <NA>      <NA>  <NA>     NA         <NA>   
#> 10 APIDAE         Apidae Apinae    <NA>  <NA>     NA         <NA>   
#> # … with 271,078 more rows
jbdorey commented 1 year ago

Thank you very much for the quick fix! It looks like my function is once again functional ;)