gdemin / expss

expss: Tables and Labels in R
https://cran.r-project.org/web/packages/expss/
84 stars 16 forks source link

Mean of mentions in multiple variable #63

Closed robertogilsaura closed 4 years ago

robertogilsaura commented 4 years ago

Hi @gdemin. Is it possible, in future, to have a tab_stat_mean_mset () that could calculate the mean of the number of mentions in a variable of multiple type (mrset or mdset)?

Usually it is a data that is usually offered simultaneously with the count of cases for each category of the multiple. Today, the data can be obtained by creating a variable that contains the number of responses in each set and calculating its mean, but it requires intermediate calculations.

It is a constant customer request such as Kantar or Nielsen. A simple code...

m1 <- c(1, 2, 1, 3, 2, 5, 1, 5, 4,NA)
m2 <- c(2,NA,NA, 1, 2, 2,NA, 3, 1,NA)
m3 <- c(9, 9,NA,NA,NA,NA,NA,NA, 9,NA)
nm <- c(3, 2, 1, 2, 2, 2, 1, 2, 3,NA)

data <- data.frame(m1,m2,m3,nm)

data %>% 
    tab_cols(total()) %>%
    tab_cells(mrset(m1,m2,m3)) %>% 
    tab_stat_cpct() %>% 
    tab_cells(nm) %>% 
    tab_stat_mean() %>% 
    tab_pivot

Thanks in advance ...

robertogilsaura commented 4 years ago

Solution was there !!!!!

library(expss)
v <- data.frame(
    a=c(1,1,1,1,2,2,1,1,1,2),
    v1=c(1,2,1,2,3,1,3,1,2,3),
    v2=c(1,2,NA,3,4,3,2,3,4,3),
    v3=c(1,NA,NA,NA,NA,1,3,4,1,2)
    )

v$d = count_row_if(gt(0), mrset(v1,v2,v3))  

v %>% 
    tab_cols(a) %>% 
    tab_cells(mrset(..f(v))) %>% 
    tab_stat_cases() %>% 
    tab_cells(mentions1=count_row_if(gt(0), mrset(v1,v2,v3))) %>% 
    tab_stat_mean() %>% 
    tab_cells(mentions2=d) %>% 
    tab_stat_mean() %>% 
    tab_pivot()

Thanks.

gdemin commented 4 years ago

Hi @robertogilsaura. Perhaps, it is more convenient to make custom function from your code. Something like this:

library(expss)
v <- data.frame(
    a=c(1,1,1,1,2,2,1,1,1,2),
    v1=c(1,2,1,2,3,1,3,1,2,3),
    v2=c(1,2,NA,3,4,3,2,3,4,3),
    v3=c(1,NA,NA,NA,NA,1,3,4,1,2)
)

mentions = function(mset){
    num_mentions = count_row_if(gt(0), mset)
    res = mean(num_mentions)
    if(!is.null(var_lab(mset)) res = setNames(res, var_lab(mset))
    res

}

tab_stat_mentions = . %>% tab_stat_fun_df(mentions, label = "Mentions")

v %>% 
    tab_cols(a) %>% 
    tab_cells(mrset_f(v)) %>% 
    tab_stat_cases() %>% 
    tab_stat_mentions() %>% 
    tab_pivot()

# |              |   a |     |
# |              |   1 |   2 |
# | ------------ | --- | --- |
# |            1 | 6.0 | 2.0 |
# |            2 | 5.0 | 1.0 |
# |            3 | 4.0 | 4.0 |
# |            4 | 2.0 | 1.0 |
# | #Total cases | 7.0 | 3.0 |
# |     Mentions | 2.4 | 2.7 |
robertogilsaura commented 4 years ago

Many thanks @gdemin

My level as a programmer is very basic, but I really appreciate your advice and recommendations. Normally, I never remember the functions and I don't dare with them.

It is obvious that the use of a function is simpler and more efficient.

Thanks again.