Open leungi opened 4 years ago
I like this idea! Definitely is a natural extension. If you want, feel free to do a pull request with this and I'll merge it and add you to the contributor list.
@TysonStanley The new version of dt_unnest()
causes this feature to no longer work. Should this be reopened?
pacman::p_load(tidyfast, data.table, magrittr)
df1 <- data.table(a = "a", b = 1)
df2 <- data.table(a = rep("a", 3), b = 1:3, c = 1:3)
nested_df <- data.table(id = 1:2,
list_col = list(df1, df2))
nested_df %>%
dt_unnest(list_col)
#> Error in `[.data.table`(dt_, , eval(col)[[1L]], by = others): j doesn't evaluate to the same number of columns for each group
That is interesting... That was one advantage to using rbindlist()
but if possible, I really want to use the [[
approach. Any ideas?
Maybe extract the list column and check if the nested data.tables have a consistent number of columns?
df1 <- data.table(a = "a", b = 1)
df2 <- data.table(a = rep("a", 3), b = 1:3, c = 1:3)
test_list <- list(df1, df2)
if (length(unique(lengths(test_list))) > 1) {
"rbindlist code"
} else {
"[[1]] code"
}
#> [1] "rbindlist code"
Yeah, I was thinking something similar. I can't find anything with the [[
in data.table
that we could change. The issue with this approach is the additional cost of getting the lengths, especially if it is really large data... I wonder how often this is. @leungi is this something you encounter a lot?
@TysonStanley @markfairbanks : thanks for bringing this up again.
I do encounter this quite often as a result of map_*()
workflow for parsing large volume of messy semi-tabular data, where column names, ncol
varies. Being able to bind everything and then remove non-informative columns based on amount of parsed data (post-binding) has been very effective.
Reprex and proposal below.
Created on 2020-04-05 by the reprex package (v0.3.0)