Closed epspi closed 4 years ago
In general re-coding data is what our vtreat
package is for ( https://github.com/WinVector/vtreat ). But to directly do this in rquery
we can use methods from the rquery
many columns vignette ( https://winvector.github.io/rquery/articles/rquery_many_columns.html ).
library(wrapr)
library(rquery)
library(rqdatatable)
d <- data.frame(x = 0:7)
codes <- paste0('x_eq_', 0:4) := paste0('as.numeric(x == ', 0:4, ')')
codes <- c(codes, 'x_ge_5' := 'as.numeric(x >= 5)')
ops <- local_td(d) %.>%
extend_se(., codes)
d %.>% ops
#> x x_eq_0 x_eq_1 x_eq_2 x_eq_3 x_eq_4 x_ge_5
#> 1: 0 1 0 0 0 0 0
#> 2: 1 0 1 0 0 0 0
#> 3: 2 0 0 1 0 0 0
#> 4: 3 0 0 0 1 0 0
#> 5: 4 0 0 0 0 1 0
#> 6: 5 0 0 0 0 0 1
#> 7: 6 0 0 0 0 0 1
#> 8: 7 0 0 0 0 0 1
A pipeline needs a factor (actually an
int
) to be one-hot encoded with right censoring. E.g.I see some possible manual avenues but not sure if there is a direct way.
extend
with conditionals to manually specify new variables for the desired levels. Results in wider table.complete_design
?