gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
214 stars 28 forks source link

Using as_survey_design inside dplyr::mutate with a call to purrr::map #165

Closed dcaldwellphd closed 1 year ago

dcaldwellphd commented 1 year ago

Hi -

Thanks for all your work on this really helpful package!

I am having some difficulty iterating as_survey_design over nested data in a function that I am writing. The following code throws an error saying no applicable method for 'as_survey_design' applied to an object of class "c('double', 'numeric')". Do you have any idea what the problem is?

library(srvyr)
library(dplyr)
library(purrr)

toydata <- tibble::tibble(
  att5val = sample(c(seq(1, 5, 1), NA), 1000, replace = TRUE),
  att10val = sample(c(seq(1, 10, 1), NA), 1000, replace = TRUE),
  weight = runif(1000, min = 0.1, max = 2),
) |>
  tidyr::pivot_longer(
    cols = c(contains("att")),
    names_to = "att_name",
    values_to = "att_val"
  )

myfunc <- function(
    data,
    value,
    by = NULL,
    # Arguments to set up survey design using as_survey_design from the srvyr package
    ids = NULL,
    probs = NULL,
    strata = NULL,
    fpc = NULL,
    weights = NULL,
    nest = FALSE
) {

  input <- data |>
    select(
      {{ value }},
      any_of(by),
      {{ ids }},
      {{ probs }},
      {{ strata }},
      {{ fpc }},
      {{ weights }}
    )

  # Creating a separate survey design object for each group level
  nested_distr <- input |>
    nest_by(across(any_of(by))) |>
    mutate(
      design_list = map(
        data,
        as_survey_design,
        ids = {{ ids }},
        probs = {{ probs }},
        strata = {{ strata }},
        fpc = {{ fpc }},
        weights = {{ weights }},
        nest = nest
      )
    )
  }

out <- toydata |>
  myfunc(
    value = att_val,
    by = "att_name",
    weights = weight
  )
bschneidr commented 1 year ago

Hi David,

It looks like a programming error where your code is iterating over columns of the data frame (whose name is ‘data’) rather than iterating over nested data frames. So I think the error message arises because you’re asking it to turn a single column of the data frame into a survey design object. Hope that helps!

Ben

dcaldwellphd commented 1 year ago

Thanks for this, Ben.

I'm not sure that is the issue. I get the same error if I give the column containing nested data frames a different key, like this:

...
nested_distr <- input |>
    nest_by(across(any_of(by)), .key = "nested_data") |>
    mutate(
      design_list = map(
        nested_data,
        as_survey_design,
        ids = {{ ids }},
        probs = {{ probs }},
        strata = {{ strata }},
        fpc = {{ fpc }},
        weights = {{ weights }},
        nest = nest
      )
    )
...

Or have I completely missed your point?

Thanks again for your help with this!

David

bschneidr commented 1 year ago

You might find it helpful to use the debug() function, which will show that in the code above, map() is operating over columns of a dataframe. So it's trying to use as_survey_design(.data = x), where x is a numeric vector rather than a data frame.

Here's a helpful video on how to use debug() within RStudio.

https://www.youtube.com/watch?v=QQxTf3o07NU&ab_channel=FreddyDrennan

Here's what I see when I use debug(as_survey_design) and then try to run the code you provided:

debug(as_survey_design)

out <- toydata |>
  myfunc(
    value = att_val,
    by = "att_name",
    weights = weight
  )

# debugging in: .f(.x[[i]], ...)
# debug: {
#   UseMethod("as_survey_design")
# }

# Browse[2]> class(.data)
# "numeric"

# Browse[2]> head(.data)
# [1]  3 NA  6  5  3 10

This shows us that map() is iterating over columns of a data frame, and using a given column for the .data argument of as_survey_design(), which is why as_survey_design() is throwing that error message: it's being given a numeric vector instead of a data frame.

I suspect the confusion arises because nest_by() returns a rowwise data frame, which is supposed to automatically vectorize functions over rows of the data frame (see: https://dplyr.tidyverse.org/articles/rowwise.html). You can either use the special vectorization caused by working with a rowwise data frame, or you can call ungroup() on your data frame (i.e., convert it from a rowwise data frame to a regular data frame) and then use map() to iterate over rows of your data frame.

library(srvyr)
library(dplyr)
library(purrr)

toydata <- tibble::tibble(
  att5val = sample(c(seq(1, 5, 1), NA), 1000, replace = TRUE),
  att10val = sample(c(seq(1, 10, 1), NA), 1000, replace = TRUE),
  weight = runif(1000, min = 0.1, max = 2),
) |>
  tidyr::pivot_longer(
    cols = c(contains("att")),
    names_to = "att_name",
    values_to = "att_val"
  )

nested_df <- toydata |> 
  nest_by(across(any_of(c("att_name"))),
          .key = "nested_data")

print(nested_df)
#> # A tibble: 2 × 2
#> # Rowwise:  att_name
#>   att_name        nested_data
#>   <chr>    <list<tibble[,2]>>
#> 1 att10val        [1,000 × 2]
#> 2 att5val         [1,000 × 2]

# Use rowwise operations
nested_designs <- nested_df |>
  mutate(
    design_object = as_survey_design(
      .data = nested_data,
      ids = 1,
      probs = NULL,
      strata = NULL,
      fpc = NULL,
      weights = 1
    ) |> list()
  ) |>
  ungroup()

print(nested_designs)
#> # A tibble: 2 × 3
#>   att_name        nested_data design_object
#>   <chr>    <list<tibble[,2]>> <list>       
#> 1 att10val        [1,000 × 2] <tbl_svy[,2]>
#> 2 att5val         [1,000 × 2] <tbl_svy[,2]>

# Or explicitly ungroup and then use map()

nested_df |>
  ungroup() |>
  mutate(
    design_object = map(
      .x = nested_data,
      .f = function(df) {

        as_survey_design(
          .data = df,
          ids = 1
        )

      }
    )
  )
#> # A tibble: 2 × 3
#>   att_name        nested_data design_object
#>   <chr>    <list<tibble[,2]>> <list>       
#> 1 att10val        [1,000 × 2] <tbl_svy[,2]>
#> 2 att5val         [1,000 × 2] <tbl_svy[,2]>

Created on 2023-08-21 with reprex v2.0.2

dcaldwellphd commented 1 year ago

Thanks for taking the time to explain this to me! That's all really helpful and solves the issue that I was having.