gdemin / expss

expss: Tables and Labels in R
https://cran.r-project.org/web/packages/expss/
84 stars 16 forks source link

Column names as back up for variable labels in tables? #74

Open courtiol opened 4 years ago

courtiol commented 4 years ago

Dear @gdemin, I find that {expss} is a nice package to create tables, even for those handling non labelled variables or values! When dealing with multiple variables, in the absence of variable labels, I was wondering if you could implement column names as a back up could improve the display. Here is an example comparing the default behaviour and the application of a small work-around implementing the idea:

library(expss)
#> 
#> Use 'expss_output_rnotebook()' to display tables inside R Notebooks.
#>  To return to the console output, use 'expss_output_default()'.
expss_output_rnotebook()
iris_small <- head(iris, n = 2)
calculate(iris_small, fre(list(Sepal.Length, Sepal.Width)))
 Count   Valid percent   Percent   Responses, %   Cumulative responses, % 
 4.9  1 50 50 50 50
 5.1  1 50 50 50 100
 \#Total  2 100 100 100
 <NA>  0 0
 3  1 50 50 50 50
 3.5  1 50 50 50 100
 \#Total  2 100 100 100
 <NA>  0 0
for (i in seq_len(ncol(iris_small))) {
  attr(iris_small[[i]], "label") <- colnames(iris_small)[i]
  class(iris_small[[i]]) <- c("labelled")
}

calculate(iris_small, fre(list(Sepal.Length, Sepal.Width)))
 Count   Valid percent   Percent   Responses, %   Cumulative responses, % 
 Sepal.Length 
   4.9  1 50 50 50 50
   5.1  1 50 50 50 100
   \#Total  2 100 100 100
   <NA>  0 0
 Sepal.Width 
   3  1 50 50 50 50
   3.5  1 50 50 50 100
   \#Total  2 100 100 100
   <NA>  0 0

Created on 2020-11-17 by the reprex package (v0.3.0)

I think that doing this internally, would save the trouble of people not handling labels to do so and the documented way seems fastidious for many columns, but perhaps I missed something:

library(expss)
expss_output_rnotebook()
iris_small <- head(iris, n = 2)

iris_small <- modify(iris_small, {var_lab(Sepal.Length) = "Sepal.Length"
                                  var_lab(Sepal.Width) = "Sepal.Width"})

calculate(iris_small, fre(list(Sepal.Length, Sepal.Width)))
 Count   Valid percent   Percent   Responses, %   Cumulative responses, % 
 Sepal.Length 
   4.9  1 50 50 50 50
   5.1  1 50 50 50 100
   \#Total  2 100 100 100
   <NA>  0 0
 Sepal.Width 
   3  1 50 50 50 50
   3.5  1 50 50 50 100
   \#Total  2 100 100 100
   <NA>  0 0

Created on 2020-11-17 by the reprex package (v0.3.0)

Thanks for considering this feature request

Alex

gdemin commented 4 years ago

Dear @courtiol, Thank you for your suggestion. I definitely will do something about this in the next release.

Unfortunately, we can't use list here because list lose information about variable names. And I will need to do complex parsing inside function which can lead to bugs. Most likely, fre will utilize data.frame names when there are no labels. Something like this:

calculate(iris_small, fre(data.frame(Sepal.Length, Sepal.Width)))

Currently fre treats data.frame as multiple-response variable - which was my bad design decision and need to be corrected.

As a temporary workaround I can suggest the following universal function:

library(expss)

names_as_labels = function(x){
    as.list(prepend_names(x))
}

iris_small <- head(iris, n = 2)
calculate(iris_small, fre(names_as_labels(data.frame(Sepal.Length, Sepal.Width))))

# |              |        | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
# | ------------ | ------ | ----- | ------------- | ------- | ------------ | ----------------------- |
# | Sepal.Length |    4.9 |     1 |            50 |      50 |           50 |                      50 |
# |              |    5.1 |     1 |            50 |      50 |           50 |                     100 |
# |              | #Total |     2 |           100 |     100 |          100 |                         |
# |              |   <NA> |     0 |               |       0 |              |                         |
# |  Sepal.Width |      3 |     1 |            50 |      50 |           50 |                      50 |
# |              |    3.5 |     1 |            50 |      50 |           50 |                     100 |
# |              | #Total |     2 |           100 |     100 |          100 |                         |
# |              |   <NA> |     0 |               |       0 |              |                         |