R Reporting - API Scripts - functions to create loops for batched import of records

In the James branch, published 8/4/2021,

get total number of records using the 'count' API endpoint
create an empty data frame
define your 'batch size'
using a 'while' loop, iterate through and download records in batches, then append those new records to the existing data frame. The GET call must incorporate both limit and skip filters.

Remaining issues: This still times out with extremely large datasets. Test instance has over 1 million follow-up records, and this iterative process still times out after 800 or 900 thousand records. The resulting error code is 524, and I've reached out to Clarisoft via JIRA to get some help.

###################################################################################################
# GET CASES
###################################################################################################

#get total number of cases
cases_n <- GET(paste0(url,"api/outbreaks/",outbreak_id,"/cases/count"), 
               add_headers(Authorization = paste("Bearer", get_access_token(), sep = " "))) %>%
  content(as="text") %>% fromJSON(flatten=TRUE) %>% unlist() %>% unname()

#Import Cases in batches 
cases <- tibble()
batch_size <- 50000 # number of records to import per iteration
skip <-0
while (skip < cases_n) {
  message("********************************")
  message(paste0("Importing records ", as.character(skip+1, scientific = FALSE), " to ", format(skip+batch_size, scientific = FALSE)))
  cases.i <- GET(paste0(url,"api/outbreaks/",outbreak_id,"/cases",
                      "/?filter={%22limit%22:",format(batch_size, scientific = FALSE),",%22skip%22:",format(skip, scientific = FALSE),"}"), 
               add_headers(Authorization = paste("Bearer", get_access_token(), sep = " "))) %>%
    content(as='text') %>%
    fromJSON( flatten=TRUE) %>%
    as_tibble()
  message(paste0("Imported ", format(nrow(cases.i), scientific = FALSE)," records"))
  cases <- cases %>% bind_rows(cases.i)
  skip <- skip + batch_size
  message(paste0("Data Frame now has ", format(nrow(cases), scientific = FALSE), " records"))
  rm(cases.i)
}
rm(batch_size, skip, cases_n)

WorldHealthOrganization / godata-r-reports

R Reporting - API Scripts - functions to create loops for batched import of records #1