Closed joewilliams-yg closed 5 years ago
Hi Joe,
Thanks for the report. You should not expect the two as.data.frame
calls to behave the same way. When you use force = TRUE
you will get a regular R data frame as a result. This means that everything in the tidyverse and other packages will work on the resulting object. When you use force = FALSE
the result is a CrunchDataFrame
which has some dataframe methods implemented, but not all of them. In this case we don't currently have a mutate
implemented in crplyr, so if you want to use that function you need to use force = TRUE
to get the R data frame. One thing you can do to make this process a bit easier is to use the collect()
function to bring the data down from Crunch. So you can do something like:
ds %>%
select(var1, var2) %>%
collect() %>% #Data is brought down from crunch at this point
mutate(...) # Continue on using dplyr/tidyr on a local data frame.
Or intention for crplyr was basically to make it easier to work with large datasets before pulling the data into your local machine for further analysis. If you want your work to be reflected on the server, it's better to use rcrunch
tools to do that job. The way to think about it is that crplyr
is good if you want to get the data out of Crunch and do something with it, rcrunch
is better if you want to manipulate the server-based Crunch dataset.
Ah, now I understand. I got confused because it was suggested I modify my work flow to remove the force=TRUE. I will try implementing select() %>% collect() %>% mutate(). Thanks!
Great, we still want to implement mutate
but in the meantime we should at least throw better error for this case.
This stems from a discussion about using as.data.frame(..., force=TRUE).
The use case here is to do external weighting. I need to be able to manipulate a data.frame object in order to use a raking script on the dataset. I don't need or want to create variables in the actual client facing dataset. If I use as.data.frame(..., force = TRUE) I get a data.frame object that I can manipulate. If is use as.data.frame(..., force = FALSE) I cannot manipulate the data.frame to do common recodings.
Yet, I have been told to use crplyr() with as.data.frame(..., force = FALSE) to get the same functionality. That doesn't appear to be the case.
Should we expect as.data.frame(..., force=FALSE) to have the same level of functionality as force=TRUE?
dt <- as.data.frame(ds[c("identity", "gender", "age", "age4", "race4", "educ4", "presvote16x", "e14_presvote12", "pid3", "ideo3", "region", "votereg2", "app_dtrmp")], include.hidden = TRUE, force = FALSE) %>% mutate( race3 = recode_factor(race4, 'White' = 'White/Other', 'Other' = 'White/Other', 'Black'='Black', 'Hispanic'='Hispanic'), educ3 = recode_factor(educ4, 'HS or less' = 'HS or less', 'Some college' = 'Some college', 'College grad' = 'College degree', 'Postgrad' = 'College degree'), educ2 = recode_factor(educ3, 'HS or less' = 'No degree', 'Some college' = 'No degree', 'College degree' = 'College grad'))
Produces the following error:
Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "CrunchDataFrame"