Closed hope-data-science closed 4 years ago
Do you mean by 'grouping' exactly sampling from each group?
I will add these functions in the next release.
In dplyr there is group_by function, while in data.table it always combine group_by and summarise in one expression. In dtplyr this is accomplished by lazy evaluation, and transfer to data.table code finally. I wonder if there is a way to get a dt_group_by in maditr, and let the user decide what to be done next.
I think I have nailed it down. If there are any mistakes, let me know. See https://hope-data-science.github.io/tidydt/reference/group_dt.html.
If you really want to have dplyr-style 'group_by', I think it is better to use data.table built-in keys functionality:
data(mtcars)
library(data.table)
library(magrittr)
dt_mt = as.data.table(mtcars)
group_dt = function(data, ...){
if(!is.data.table(data)) data = as.data.table(data)
setkey(data, ..., verbose = FALSE)
}
summarise_dt = function(data, ...){
if(!is.data.table(data)) data = as.data.table(data)
keys = key(data)
args = substitute(list(...))
res = data[, eval(args), by = keys]
# in dplyr after summarizing we drop last grouping variable
new_keys = keys[-length(keys)]
if(length(new_keys)==0) new_keys = NULL
setkeyv(res, cols = new_keys, verbose = FALSE)
####
res
}
mtcars %>%
group_dt(am, vs) %>%
summarise_dt(mpg = mean(mpg)) %>%
print() %>%
summarise_dt(mpg = mean(mpg))
But, of course, it is up to you:)
I think you have handle this in the most correct way. All tidydt
do is to "translate", but your code is the most analogous to group_by
. It should be implemented in maditr
, but maybe there's more work to update every function to catch the key
.
Forgive me. My capability just does not allow me to write and handle such advanced codes. Do make it in maditr
if you have time, I love it!
I am a loyal user of dplyr but turn to data.table when efficiency is taken into consideration. maditr provides a very good approach for users like me, but grouping and nesting are very important methods too. Is there any plans to add these to maditr? Something for reference: Grouping-- https://cran.r-project.org/web/packages/rqdatatable/vignettes/GroupedSampling.html Nesting--https://tysonbarrett.com/tidyfast/reference/dt_unnest.html