gdemin / expss

expss: Tables and Labels in R
https://cran.r-project.org/web/packages/expss/
84 stars 16 forks source link

Question about computing percentages with different bases #90

Closed robertogilsaura closed 2 years ago

robertogilsaura commented 3 years ago

Hello @gdemin, I hope you are well.

I have a question about how to get some information in tables. I want to get information for a sales funel. I have four variables, which are linked in the survey about the knowledge of a series of products. All variables are multi-response.

Q0 = Know the product Q1 = Was he ever interested in the product Q2 = Have you ever valued your purchase Q3 = You bought the product

My intention is to obtain the percentages of each product in three different tables, one case (1) normal cpct, other case (2) on the total sample and (3) other case with percentages over the base of product (back column in table). See manual compute in tables.xlsx. I'm not able in order to reproduce this compute directly. My problem is how to change the total for computing pct without operating in etable object.

library(expss)
library(openxlsx)
df <- read_spss('FUNEL.sav') 

tab1 <- df %>%
    tab_cells(mrset_f(Q0_)) %>% 
    tab_stat_cases(label='KNOW') %>% 
    tab_cells(mrset_f(Q1_)) %>% 
    tab_stat_cases(label='INTEREST') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q2_)) %>% 
    tab_stat_cases(label='INTENTION') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q3_)) %>% 
    tab_stat_cases(label='PURCHASE') %>% 
    tab_last_hstack() %>% 
    tab_pivot()

tab2 <- df %>%
    tab_cells(mrset_f(Q0_)) %>% 
    tab_stat_cases(label='KNOW') %>% 
    tab_cells(mrset_f(Q1_)) %>% 
    tab_stat_cases(label='INTEREST') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q2_)) %>% 
    tab_stat_cases(label='INTENTION') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q3_)) %>% 
    tab_stat_cases(label='PURCHASE') %>% 
    tab_last_hstack() %>% 
    tab_pivot()

tab3 <- df %>%
    tab_cells(mrset_f(Q0_)) %>% 
    tab_stat_cases(label='KNOW') %>% 
    tab_cells(mrset_f(Q1_)) %>% 
    tab_stat_cases(label='INTEREST') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q2_)) %>% 
    tab_stat_cases(label='INTENTION') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q3_)) %>% 
    tab_stat_cases(label='PURCHASE') %>% 
    tab_last_hstack() %>% 
    tab_pivot()

tabs <- list(tab1, tab2, tab3)

xl_write_file(tabs, 'tables.xlsx', sheetname = "Tables")

Thanks in advance ...

ts014.funel.zip

robertogilsaura commented 3 years ago

My compute, but I don't know if there is other more efficient way ...

tab1 <- df %>%
    tab_cells(na_if(mrset_f(Q0_)),gt(15)) %>% 
    tab_stat_cases(label='KNOW') %>% 
    tab_cells(mrset_f(Q1_)) %>% 
    tab_stat_cases(label='INTEREST') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q2_)) %>% 
    tab_stat_cases(label='INTENTION') %>% 
    tab_last_hstack() %>% 
    tab_cells(mrset_f(Q3_)) %>% 
    tab_stat_cases(label='PURCHASE') %>% 
    tab_last_hstack() %>% 
    tab_pivot()

tab2 <- tab1

tab2$`#Total|KNOW` <- 100*(tab1$`#Total|KNOW`/nrow(df))
tab2$`#Total|INTEREST` <- 100*(tab1$`#Total|INTEREST`/nrow(df))
tab2$`#Total|INTENTION` <- 100*(tab1$`#Total|INTENTION`/nrow(df))
tab2$`#Total|PURCHASE` <- 100*(tab1$`#Total|PURCHASE`/nrow(df))
tab2

tab3 <- tab1
tab3$`#Total|INTEREST` <- 100*(tab1$`#Total|INTEREST`/tab1$`#Total|KNOW`)
tab3$`#Total|INTENTION` <- 100*(tab1$`#Total|INTENTION`/tab1$`#Total|INTEREST`)
tab3$`#Total|PURCHASE` <- 100*(tab1$`#Total|PURCHASE`/tab1$`#Total|INTENTION`)
tab3

tabs <- list(tab1, tab2, tab3)

tabs

Thanks in advance ...

gdemin commented 2 years ago

Hi, @robertogilsaura Sorry, but I don't know good solution for this task. I just can suggest to save several keystrokes:

tab2 <- tab1
tab2[,-1] <- 100 * tab2[,-1]/nrow(df)
tab3 <- tab1
tab3[, -(1:2)] <- 100*tab3[, -(1:2)]/tab3[, -c(1, ncol(tab3)] 
robertogilsaura commented 2 years ago

Hi @gdemin.

This is a very specific case, like others that may occur. It is simple to solve it with operations on etable. Your calculation is more appropriate than mine. Learning with each answer.

Thanks for your code. I close the issue.