SebKrantz / collapse

Advanced and Fast Data Transformation in R
https://sebkrantz.github.io/collapse/
Other
662 stars 35 forks source link

dplyr::slice_tail() grouped-collapse substitute #520

Closed Steviey closed 11 months ago

Steviey commented 11 months ago

Since approximately two weeks I try to reproduce dplyr's slice_tail()-functionality in a grouped setup, with collapse. I have read the documentations, but found no hint about it. Is it possible with collapse?


df <- tibble(
    col1 = c(1, 2, 3, 4, 5,1, 2, 3, 4, 5),
    col2 = c("A", "B", "C", "D", "E","A", "B", "C", "D", "E"),
    qRank = as.factor(c("Low", "Medium", "High", "Low", "Medium","Low", "Medium", "High", "Low", "Medium"))
)

df <-df %>% 
    dplyr::group_by(qRank) %>%
    dplyr::slice_tail(.,n=3) %>%
    dplyr::ungroup() 

cat('\n')
print(as_tibble(df),n=100,max_extra_cols=0,width=110)
cat('\n') 

output:

# A tibble: 8 × 3
col1 col2  qRank 
<dbl> <chr> <fct> 
1     3 C     High  
2     3 C     High  
3     4 D     Low   
4     1 A     Low   
5     4 D     Low   
6     5 E     Medium
7     2 B     Medium
8     5 E     Medium
SebKrantz commented 11 months ago

Currently not supported like this (and I don't think it ever will). But you can do it like this

options(fastverse.styling = FALSE)
library(fastverse)
#> -- Attaching packages --------------------------------------- fastverse 0.3.2 --
#> Warning: package 'data.table' was built under R version 4.3.1
#> v data.table 1.14.10     v kit        0.0.13 
#> v magrittr   2.0.3       v collapse   2.0.7

# Get first 2 rows of mtcars by vs and am
mtcars %>% fsubset(BY(seq_row(.), list(vs, am), extract, 1:2))
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

# More efficient and does not require magrittr:
mtcars |> fsubset(vapply(gsplit(g = list(vs, am)), `[`, 1:2, 1:2))
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

Created on 2024-01-01 with reprex v2.0.2

Steviey commented 11 months ago

thx.

Steviey commented 11 months ago

Thank you for your advice Mr. Krantz. Unfortunately I can't recreate the right result (see coding trials). What is wrong with me?

mode=5
if(mode==5){
    options(fastverse.styling = FALSE)
    library(fastverse)

    df1 <- tibble(
        col1 = c(1, 2, 3, 4, 5,1, 2, 3, 4, 5),
        col2 = c("A", "B", "C", "D", "E","A", "B", "C", "D", "E"),
        qRank = as.factor(c("Low", "Medium", "High", "Low", "Medium","Low", "Medium", "High", "Low", "Medium"))
    )

    df1 <-df1 %>% 
        dplyr::group_by(qRank) %>%
        dplyr::slice_tail(.,n=3) %>%
        dplyr::ungroup() 

    cat('\n')    
    print('Result 1:')
    cat('\n')
    print(as_tibble(df1),n=100,max_extra_cols=0,width=110)
    cat('\n') 

    ################################
    # Trial 1 to replicate result 1:
    df2 <-df1 %>% 
    {
        tryCatch(
            fsubset(df1,BY(seq_row(.), list(qRank), extract, 1:12))
            ,warning =function(w){print(w)}
            ,error   =function(e){print(e)}
        )
    }
    cat('\n')
    print('Result 2:')
    cat('\n')
    print(as_tibble(df2),n=100,max_extra_cols=0,width=110)
    cat('\n') 

    check1 <-collapse::all_obj_equal(df1,df2)
    ################################
    # Trial 2 to replicate result 1:
    df3 <-df1 %>% 
    {
        tryCatch(
            fsubset(df1,vapply(gsplit(g = list(qRank)), `[`, 1:3, 1:3))
            ,warning =function(w){print(w)}
            ,error   =function(e){print(e)}
        )
    }
    cat('\n')                
    print('Result 3:')
    cat('\n')
    print(as_tibble(df3),n=100,max_extra_cols=0,width=110)
    cat('\n') 

    check2 <-collapse::all_obj_equal(df1,df3)
    ##########################################
    ##########################################
    resTbl<-tibble(trial1=check1,trial2=check2)

    cat('\n')
    print('Trial Results:')
    cat('\n')
    print(as_tibble(resTbl),n=100,max_extra_cols=0,width=110)
    cat('\n') 

    stop()

}
SebKrantz commented 11 months ago

Sorry man, but I can't sort out your code all the time, and I can't have another non-issue from you every day. collapse is well documented and behaves in predictable ways. Issues are there to indicated problems with the software, not to teach people how to use it or how to rewrite their tidyverse code using it.

Steviey commented 11 months ago

Can I ask a question every 4 weeks?

SebKrantz commented 11 months ago

Please ask your questions on Stackoverflow: https://stackoverflow.com/. Issues is for definite problems with the software.