arthur-shaw / susoapi

R interface for Survey Solutions' APIs
https://arthur-shaw.github.io/susoapi/
Other
9 stars 5 forks source link

Getting data as data.frame #1

Closed ashwinikalantri closed 3 years ago

ashwinikalantri commented 3 years ago

The data export function saves the file to a directory. Will it be possible to export the data to a data frame directly? The SurveySolutionsAPI used to do this.

arthur-shaw commented 3 years ago

Ashwini, thanks for your interest in this package!

From what I understand, I'm not sure I want to implement that in this package. The goal here is to have wrapper functions for every API endpoint. Also, I'd rather have functions that do one thing. This makes them both easier to test and easier to troubleshoot if anythings fail (e.g., whether request failed, download failed, ingesting data failed, etc.). If more than one thing needs to be done, it might be better to use a separate function for each thing (e.g., one function to get data, another function to ingest data).

While this may sound like a "no", I have a two-part answer that's a potential "yes".

First, this idea might be better suited for another package I'm developing: susoflows, which aims to provide functions for common (or complex) survey workflows. This could be one of those workflows. But even then, I'd still want to think about how to handle cases where data from several versions of a questionnaire are downloaded and need to be combined somehow. Thinking about things quickly, this strikes me as something that might be better suited for a user who knows the data (e.g., data versions 1 and 2 cannot be merged, column A should be recast to a different type in version 1 before merging, etc.), than a function that does not.

Second, here are two function that I developed for another project that might your needs:

To get a list of (Stata) files in a directory:

#' Get vector of file names matching a pattern
#' 
#' Returns a character vector of file names that match a regex pattern in the target directory
#' 
#' @param dir Character vector. File path to directory
#' @param pattern Character vector. Regular expression describing files to return
#' 
#' @importFrom fs dir_ls path_file
#' @importFrom stringr str_subset
#' 
#' @return Character vector. File names with `.dta` extension
get_matching_files <- function(
    dir, 
    pattern
) {

    file_names <- fs::dir_ls(
            path = dir,
            recurse = FALSE,
            regexp = "\\.dta"
        ) %>%
        fs::path_file() %>%
        stringr::str_subset(pattern = pattern)

    return(file_names)

}

To ingest those files into data frames:

#' Read Stata file into memory
#' 
#' First, ingest data file. 
#' Then, write it to an object in the global environment, where the object name is the file name minus the `.dta` extension.
#' These names follow the same pattern as the input files in order to facilitate merging.
#' 
#' @param dir Character vector. File path to directory
#' @param file_name Character vector. File name with `.dta` extension
#' 
#' @importFrom stringr str_replace
#' @importFrom haven read_dta
#' 
#' @return Side-effect of creating an object in the global environment.
ingest_dta_file <- function(
    dir,
    file_name
) {

    df_name <- stringr::str_replace(
            string = file_name,
            pattern = "\\.dta",
            replacement = ""
        )

    assign(
        x = df_name,
        value = haven::read_dta(
            file = paste0(dir, file_name)
        ),
        envir = .GlobalEnv
    )

}

Combining the two:

# get list of matching files in directory
files <- get_matching_files(
    dir = my_dir, 
    pattern = ""
)

# ingest all of those files
purrr::walk(
    .x = files,
    .f = ingest_dta_file,
    dir = my_dir
)
ashwinikalantri commented 3 years ago

Thanks for the functions. They were extremely helpful. I love the idea of small functions, with susoflows to bind them togather. I am looking forward to seeing these packages expand!