Open statquant opened 7 years ago
Also:
https://stackoverflow.com/questions/44336733/ https://stackoverflow.com/q/48831637/3576984
Suggestion to make such a function more flexible than unnest
as well; building on the example in the linked SO Q, consider:
dt1 <- data.table(
colA= c('A1','A2','A3'),
colB=list('B1',c('B2a','B2b'),'B3'),
colC=list(c('C1a', 'C1b'),'C2','C3'),
colD= c('D1','D2','D3')
)
tidyr::unnest(dt1)
is an error, but a "cross-join" at rows with mis-matched lengths is probably appropriate.
Yes, this would be useful, agree. As I wrote under https://github.com/Rdatatable/data.table/issues/2159, I remember implementing unwrap()
sometime ago, and I prefer it to unnest
, if that's all fine. Marking #2159 as duplicate.
Some other possible extensions / convenience features from this SO q: https://stackoverflow.com/q/56981960
# example with a list of nested DFs
library(data.table)
DT <- data.table(colA= c('A1','A2','A3'),
colB=list(data.frame(),
data.frame(colsubB1=c('B2a','B2b'),colsubB2=c('B2c', 'B2d')),
data.frame(colsubB1=c('A3a','A3b'),colsubB2=c('A3c', 'A3d'))),
colC= c('C1','C2','C3'),
colD= c('D1','D2','D3')
DT[, lens := sapply(colB, nrow)]
# colA colB colC colD lens
# 1: A1 <data.frame> C1 D1 0
# 2: A2 <data.frame> C2 D2 2
# 3: A3 <data.frame> C3 D3 2
# desired output...
# preserves length == 0 elements filled with NA (instead of dropping)
# drops the columns in place (instead of at the end)
# colA colsubB1 colsubB2 colC colD lens
# 1: A1 <NA> <NA> C1 D1 0
# 2: A2 B2a B2c C2 D2 2
# 3: A2 B2b B2d C2 D2 2
# 4: A3 A3a A3c C3 D3 2
# 5: A3 A3b A3d C3 D3 2
# versus tidyr::unnest
tidyr::unnest(DT)
# colA colC colD lens colsubB1 colsubB2
# 1: A2 C2 D2 2 B2a B2c
# 2: A2 C2 D2 2 B2b B2d
# 3: A3 C3 D3 2 A3a A3c
# 4: A3 C3 D3 2 A3b A3d
As a mod to @franknarf1 's code, the empty row can be preserved
tidyr::unnest(DT, colB, keep_empty = TRUE)
# # A tibble: 5 x 5
# colA colsubB1 colsubB2 colC colD
# <chr> <chr> <chr> <chr> <chr>
# 1 A1 <NA> <NA> C1 D1
# 2 A2 B2a B2c C2 D2
# 3 A2 B2b B2d C2 D2
# 4 A3 A3a A3c C3 D3
# 5 A3 A3b A3d C3 D3
This seems to work quite efficiently (my test with 1 million rows of similar structure took about a minute)
dt <- data.table(A=c(1,2,3),B=list(c("A","B","C"),"D",c("E","F"))
dt.flat <- dt[,unlist(B),by=A]
For reference the unnest implementation in mlr3misc: https://mlr3misc.mlr-org.com/reference/unnest.html
Similarly to
tidyr:::unnest
data.table
would benefits from a fast unnest function. I foundis there something canonical ? if not that a FR !