arthur-shaw / susoapi

R interface for Survey Solutions' APIs
https://arthur-shaw.github.io/susoapi/
Other
9 stars 5 forks source link

Rewrite `get_questionnaires` to page through results #8

Open arthur-shaw opened 3 years ago

arthur-shaw commented 3 years ago

See #5

SamHarsha commented 5 months ago

Hello, I was trying to get a list of questionnaires on our workspace (there are around 200) and the dataframe returns 100 values. I was trying to follow up on the discussion linked and I get a sense that it is done to not overwhelm the servers. But can you suggest how can I get other questionnaires? I specifically want questionnaires matching a particular word "baseline" in the title and I thought I could filter after I get the entire dataframe. But if there is a way I can incorporate that in the main GET request, that also works.

arthur-shaw commented 5 months ago

This requires writing some code to page through the results.

For the GraphQL API, one gets filteredCount--that is, the total number of questionnaires (that meet a query). With this number in hand, one needs to:

If you're comfortable modifying my source code for get_questionnaires(), you can introduce a skip argument that takes the number of entries to skip. That way, you could get the more than 100 entries through several requests. Here's a sketch of what that might look like. This might even work. I've not tested it. I just drafted it for illustrative purposes.

#' Get the next batch of questionnaires
#'
#' Get the next 100 questionnaires after skipping the number in `skip`.
#' 
#' GraphQL implementation of the deprecated REST `GET​/api​/v1​/questionnaires` endpoint.
#'
#' @param server Character. Full server web address (e.g., \code{https://demo.mysurvey.solutions}, \code{https://my.domain})
#' @param workspace Character. Name of the workspace whose questionnaires to get. In workspace list, value of `NAME`, not `DISPLAY NAME`, for the target workspace.
#' @param user Character. API user name
#' @param password Character. API password
#' @param skip Integer. Number of questionnaire entries to skip
#'
#' @return Data frame of questionnaires.
#' 
#' @importFrom assertthat assert_that
#' @import ghql
#' @importFrom jsonlite base64_enc fromJSON
#' @importFrom glue glue double_quote
#' @importFrom dplyr pull
#'
#' @export
get_next_questionnaires <- function(
    server = Sys.getenv("SUSO_SERVER"),     # full server address
    workspace = Sys.getenv("SUSO_WORKSPACE"),
    user = Sys.getenv("SUSO_USER"),         # API user name
    password = Sys.getenv("SUSO_PASSWORD"),  # API password  
    skip
) {

    # check inputs
    # invalid name
    # workspace does not exist
    check_workspace_param(workspace = workspace)

    # compose the GraphQL request client
    questionnaires_request <- ghql::GraphqlClient$new(
        url = paste0(server, "/graphql"), 
        headers = list(authorization = paste0(
            "Basic ", jsonlite::base64_enc(input = paste0(user, ":", password)))
        )
    )

    # compose the query for all interviews
    # use string interpolation to pipe double-quoted workspace name into query
    qry <- ghql::Query$new()
    qry$query("questionnaires", 
        glue::glue("{
            questionnaires (workspace: <glue::double_quote(workspace)>, skip: <skip>) {
                nodes {
                    id
                    questionnaireId
                    version
                    variable
                    title
                    defaultLanguageName
                    translations {
                        id
                        name
                    }
                }
                filteredCount   
            }
        }", .open = "<", .close = ">")
    )

    # send request
    questionnaires_result <- questionnaires_request$exec(qry$queries$questionnaires)

    # convert JSON payload into an R object
    qnrs <- jsonlite::fromJSON(questionnaires_result, flatten = TRUE)
    qnr_count <- qnrs$data$questionnaires$filteredCount

    if ("errors" %in% names(qnrs)) {

        # extract and display error(s)
        errors <- dplyr::pull(qnrs$errors) %>% paste0(collapse = "\n")
        stop(errors)

    } else if (qnr_count == 0) {

        message(glue::glue(
            "No questionnaires found in workspace {glue::backtick(workspace)}.",
            "If this result is surprising, check the input in the `workspace` parameter.",
            .sep = "\n"
        ))

    } else if (qnr_count > 0) {

        # extract data frame from nested containers
        qnrs_df <- qnrs$data$questionnaires$nodes

        # correct class of defaultLanguageName, which may often be empty
        qnrs_df$defaultLanguageName <- as.character(qnrs_df$defaultLanguageName)

        # rename variables to names from REST ?

            # What REST CURRENTLY RETURNS:
            # "QuestionnaireIdentity": "string",
            # "QuestionnaireId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
            # "Version": 0,
            # "Title": "string",
            # "Variable": "string",
            # "LastEntryDate": "2021-06-01T13:41:59.328Z",
            # "IsAudioRecordingEnabled": true,
            # "WebModeEnabled": true

            # How to rename:
            # qnrs_df <- qnrs_df %>%
            #     rename(
            #         QuestionnaireIdentity = questionnaireId,
            #         QuestionnaireId = id,
            #         Version = version,
            #         Variable = variable,
            #         Title = title
            #     )

        return(qnrs_df)

    }

}

Here's how the query looks in GraphQL. In this example, the workspace has 16 questionnaires, the query skipped the first 10, and returns the last 6.

image

If time allows, I'll see if I can implement something over the weekend that addresses this issue.

Sorry for the delayed response.

SamHarsha commented 5 months ago

Thank you for the code and example. I will try to incorporate the changes in the source code for get_questionnaires. No worries about the delay; I'll keep an eye out for an update if I'm not successful.