TimTeaFan / dplyover

Create columns by applying functions to vectors and/or columns in 'dplyr'.
https://timteafan.github.io/dplyover/
Other
61 stars 0 forks source link

error across2x() taking in list column of doubles #20

Open brshallo opened 3 years ago

brshallo commented 3 years ago

I was trying to recreate this example I made with pwiser https://gist.github.com/brshallo/01496b68adef88a71de1fd44f3712b10 using dplyover but ran into this error:

library(tidyverse)
if (!require(dplyover) ) remotes::install_github("TimTeaFan/dplyover")

penguins <- palmerpenguins::penguins %>% na.omit()

t_test_statistic <- function(x, y){
  t.test(x, y) %>% 
    broom::tidy() %>% 
    pull(statistic)
}

penguins %>% 
  group_by(species, year) %>% 
  summarise(flipper_length_mm = list(flipper_length_mm)) %>% 
  pivot_wider(names_from = year,
              values_from = flipper_length_mm) %>% 
  rowwise() %>%
  mutate(
    dplyover::across2x(where(is.list), 
                       where(is.list),
                       t_test_statistic,
                       .comb = "minimal")
  ) %>% 
  select(species, contains("_"))
#> `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
#> Error: Problem with `mutate()` input `..1`.
#> i `..1 = dplyover::across2x(...)`.
#> i `..1` must be size 1, not 0.
#> i Did you mean: `..1 = list(dplyover::across2x(...))` ?
#> i The error occurred in row 1.

Wasn't sure what the issue was... If i do the same thing though passing in a list of dataframes though it works... eg

var_interest <- "flipper_length_mm"

penguins %>%
  group_nest(species, year) %>%
  pivot_wider(names_from = year,
              values_from = data) %>%
  rowwise() %>%
  mutate(dplyover::across2x(where(is.list),
                            where(is.list),
                            ~ t_test_statistic(.x[[var_interest]], .y[[var_interest]]),
                            .comb = "minimal")
  ) %>% 
  select(species, contains("_"))
TimTeaFan commented 3 years ago

Thanks for spotting this. I was able to track down the issue to this line:

data <- dplyr::cur_data()[1, ]

This is the data used in across2x_setup. The problem is here that we are in a rowwise data.fame and with cur_data() we get each list element as column content, but not the list itself. That's the reason why where(is.list) doesn't work here. The same holds true for across2 and crossover.

This all is rooted in the problem that we cannot access the whole actual data from outside of {dplyr}. Within {dplyr} we could use dplyr:::cur_data_all(). For the next version of {dplyover} I'm working on a fix.