brad-cannell / codebookr

Create Codebooks From Data Frames
https://brad-cannell.github.io/codebookr/
Other
25 stars 5 forks source link

Create a function to inject arbitrary summary tables into the codebook #53

Open mbcann01 opened 1 month ago

mbcann01 commented 1 month ago

Overview

on 2024-05-23, I got the following email:

I wonder if the following is possible. I want in the summery statistics to leave out missings. So that for instance the summary statistics of percentages for categories do not have missings as a level. Also I want to add my own number for the missings. The reason is I have a pre-post test in long format where a number of variables by design is missing in eather the pre or the post test and it gives the impression of 50 percent missings sometimes.

My response was:

Thanks for using codebookr. I think I understand your use case. Unfortunately, it isn't currently possible to omit missing from the summary stats table for each variable in a straightforward way. It was never my intention for the summary tables to be used as tables of results. They are simply supposed to describe the data. So, in your case, the fact that 50% of the rows have a missing value isn't a problem -- they are missing by design -- but, it's still a true reflection of the state of the data -- 50% of the rows do have a missing value. Of course, you may want to ignore the missing values for your analysis, but I didn't approach the codebook as the place where I wanted to present the results of my analysis. I hope that makes sense. Having said that, I want to be helpful. I can think of a few possible workarounds that may be helpful to you.

  1. You can fork the repository and make edits to the code on your end. I think you would want to modify R/cb_summary_stats_few_cats.R and R/cb_summary_stats_many_cats.R. You could add some code to filter out missing before calculating percentages.
  2. All of the tables are built with the flextable and officer packages. I think you can create your own summary tables and then "inject" them into the codebook before printing it to a Word document. I can't give you the exact code for this off the top of my head, but it should be theoretically possible. It may take some experimenting.
  3. Finally, you could manually manipulate the summary tables in Word. Although, I am sure that doesn't sound desirable -- particularly if you have a large number of variables. I'm sure that isn't the response you were hoping for, but I hope it's somewhat helpful.

For the purpose of this issue, I'm particularly interested in option 2 above. I wonder if it would be relatively easy to create a function that would allow users to inject arbitrary tables into the summary stats section of the codebook.

edambo commented 1 month ago

@mbcann01 I think I got this to work. I created a function cb_custom_summary_stats_to_ft in file cb_custom_summary_stats_to_ft.R to maintain the same flextable formatting for the codebook and modified the codebook function to allow the user to inject a list of data frames containing arbitrary summary statistics for specified columns. Per Morri's suggestion, I added a new argument, omit_na_columns, that prevents "Missing" from appearing as a category for specified variables in the resulting codebook file.

edambo commented 1 week ago

I've made the updates discussed on 6/14/2024 to address this issue: