gdemin / maditr

Fast Data Aggregation, Modification, and Filtering
61 stars 3 forks source link

Could dt_select support selection by column names pattern? #9

Closed hope-data-science closed 4 years ago

hope-data-science commented 4 years ago

In dplyr, we have iris %>% select(starts_with("Se")). In maditr, the analogous way is iris %>% dt_select(startsWith(colnames(.),"Se")).

Is there some way to make it more convenient? Such as: iris %>% dt_select("^Se") Once receive a character, understand it as a regular expression.

gdemin commented 4 years ago

Good idea, I will definitely implement this in the next release. If we have character started with ^ or ended with $ then we will expand it to column names. But note that we haven't parametric evaluation in this case:

# dplyr
my_vars = "Se"  
iris %>% select(starts_with(my_vars )) # work

# maditr
my_vars = "^Se"  
iris %>% dt_select(my_vars ) # will not work
hope-data-science commented 4 years ago

While ^ and $ are good, but dplyr::contains might also be useful. Perhaps provide a new parameter like "regex"?
Thanks.

gdemin commented 4 years ago

You can describe any name with "^" and "$", e. g. "^.*my_string" is equivalent to contains("my_string"). However I will think about general API for variable selectors.

hope-data-science commented 4 years ago

I have tried too, but I could not support Sepal.Length:Petal.Length for select() so far. See https://hope-data-science.github.io/tidydt/reference/select_dt.html.