DiskFrame / disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
https://diskframe.com
Other
594 stars 40 forks source link

dplyr::mutate on a disk frame ignores case_when #171

Closed caewok closed 5 years ago

caewok commented 5 years ago

If a case_when is used inside a mutate, the entire mutate function is ignored in a disk frame, and no error is thrown. Example:

value <- as.disk.frame(tibble(char = LETTERS,
                                 num = 1:26))
value %>%
  dplyr::mutate(b =  case_when(
    char %in% c("A", "B", "C") ~ "1",
    TRUE ~ char)) %>%
  compute %>% head
# No b column at all

Compare to a tibble, which works fine:

value <-tibble(char = LETTERS,
                                 num = 1:26)
value %>%
  dplyr::mutate(b =  case_when(
    char %in% c("A", "B", "C") ~ "1",
    TRUE ~ char)) %>%
   head

The workaround (for now) is to enclose the case_when in a function:

fn <- function(char) {
  case_when(
    char %in% c("A", "B", "C") ~ "1",
    TRUE ~ char)
}
value %>%
  dplyr::mutate(b = fn(char)) %>%
  compute %>% head
# B column correctly calculated

But I suspect this bug portends some larger problem with how these types of dplyr calls are handled by disk frame.

xiaodaigh commented 5 years ago

Thank you very much for the bug report. I will prioritize this. Thanks for you understanding that {disk.frame} is still v0.1.1 so bugs like this are expected. Also, comptue is meant to be run it's side-effects only. Just calling head alone should be sufficient e.g.

library(disk.frame)
value <- as.disk.frame(tibble(char = LETTERS,
                              num = 1:26))
value %>%
  dplyr::mutate(b =  paste0(char, num)) %>% 
  head
xiaodaigh commented 5 years ago

This is working

library(disk.frame)
setup_disk.frame() # this is useful if you want to take advantage of multi-core CPUs

value <- as.disk.frame(tibble(char = LETTERS,
                              num = 1:26))
value %>%
  dplyr::mutate(b =  case_when(
    char %in% c("A", "B", "C") ~ "1",
    TRUE ~ char)) %>%
   head
xiaodaigh commented 5 years ago

This might be a bug with tidyeval, but I think I've found a way to do my code better anyway

caewok commented 5 years ago

Great; thanks!