decisionpatterns / r-dummies

Create dummy/indicator variables flexibly and efficiently
Other
3 stars 1 forks source link

Make (more) compatible with tidyverse/tibbles, etc. #4

Open ctbrown opened 6 years ago

ctbrown commented 6 years ago

From Ryan Brellenthin (via email)

Hi Chris,

I ran across your dummies R package today, and I think it will be a great addition to my R toolkit. Thanks for your contribution!

One suggestion for continued improvement: it would be great if the dummy.data.frame function were compatible with tibbles in addition to data frames. With the rise of the tidyverse and the tibble (as a variation on the data frame), it would be great to allow the dummy.data.frame function to work with pipes (%>%) and tidyverse functions without first requiring explicit conversion to a data frame if the input data is stored as a tibble.

I've included an example below to show how the output of the dummy.data.frame function is different depending on whether the input is a tibble or a data frame.

Hope this helps. Thanks again for your contribution to the R community. Let me know if you have any questions or need additional information about this recommendation.

Take care, Ryan Brellenthin

Load packages

library(dummies)
library(tibble)

Create duplicate versions of same data, with one as data frame and one as tibble

df1 <- iris
df2 <- as.tibble(iris)

Compare output of dummy.data.frame function on data frame and tibble

head(dummy.data.frame(df1))
head(dummy.data.frame(df2))
ryanbrellenthin commented 6 years ago

@ctbrown A quick fix could be as simple as including the following as the first line within the function:

data <- as.data.frame(data)

If the tibble structure is purely about presentation/formatting when printing and there is nothing else that gets lost when converting to a data frame, then this solution should work fine.

ctbrown commented 6 years ago

I haven't looked at this in depth, but I don't understand why it is failing since tibbles are a subclass of data.frame:

> data(iris)
> tbl <- as_tibble(iris)
> class(tbl)
[1] "tbl_df"     "tbl"        "data.frame"
> is.data.frame(tbl)
[1] TRUE

There might just be something else going on here.

ryanbrellenthin commented 6 years ago

I just ran across this RStudio blog post about tibbles. The "Tibbles vs. data frames" section seems relevant, especially the section on subsetting.

Looks like in line 31 of your code, a data frame with single bracket notation will return a vector but a tibble will return a data frame (and will ignore drop = TRUE even if specified). I did a quick test and changing class(data[,nm]) to class(data[[nm]]) seems to work just fine as an alternative fix to what I proposed.

In the last section of the blog post ("Interacting with legacy code"), it's noted that turning a tibble back into a data frame using as.data.frame() works with legacy code. Either option should work here.