FredHutch / VISCtemplates

Tools for writing reproducible reports at VISC
Other
6 stars 2 forks source link

Improve docx table formatting, standardize and functionalize across pdf/docx #120

Open kelliemac opened 10 months ago

kelliemac commented 10 months ago

Standardize/functionalize table formatting with a function similar to visc_theme() which is used for figure formatting

kelliemac commented 5 months ago

Packages gt and flextable both seem to allow customizing table formatting in a way that applies to both PDF and Word documents. It's hard for me to tell which one would be better for VISCtemplates at this point - anyone have any thoughts? I've used flextable for Word tables and been pretty happy with it, but have so far only used kable and kableExtra for Latex/PDF, so I can't comment on how well it works for PDF outputs. And it may be useful to know that gt is developed by Posit.

kelliemac commented 3 months ago

I have done some experimenting with a function that generates tables using kable for pdf, and flextable for docx, and been happy with it! Here is the code I used for the G002 B-cell PT report, which we can build off:

output_type <- get_output_type()

options(knitr.kable.NA = '') # NA's will be blank in tables

set_flextable_defaults(font.size = 8,
                       theme_fun = "theme_box")

# custom kable function for consistent table formatting
make_visc_table <- function(df, 
                            caption, 
                            caption.short = NA, 
                            fontsize = 8, 
                            digits = 3, 
                            cols.to.collapse = NA, 
                            longtable = FALSE, 
                            latex_options = c("HOLD_position", "repeat_header"),
                            cwidth = 0.75) {

  if (output_type == 'latex') {

    tab <- df %>% 
      kable(format = output_type, 
            longtable = longtable, 
            booktabs = TRUE,
            linesep = "", 
            escape = FALSE,
            caption = caption,
            caption.short = ifelse(is.na(caption.short), caption, caption.short),
            digits = digits) %>%
      kable_styling(latex_options = latex_options,
                    font_size = fontsize)

    if (!any(is.na(cols.to.collapse))) {
      tab <- tab %>% 
        collapse_rows(columns = cols.to.collapse, valign = 'middle', latex_hline = "full")
    }

  } else {

    tab <- df %>% 
      flextable(cwidth = cwidth,
                cheight = 0.2) %>% 
      set_caption(caption = caption,
                  # extra space above table caption so that tables don't get too close
                  fp_p = officer::fp_par(padding.top=30, padding.bottom=3)) 

    if (longtable == F) {
      # keep table on one page in word doc
      tab <- tab %>% paginate(init = T, hdr_ftr = T)
    }

      if (!any(is.na(cols.to.collapse))) {
        tab <- tab %>% merge_v(j=cols.to.collapse)
      }

  }

  tab

}
slager commented 1 month ago

@kelliemac I noticed that 4367cab in #183 didn't work on the statsrv runner and looked into it a bit.

Both gt and flextable require adding substantial new dependencies to VISCtemplates. These are the sizes of the dependency tree for each package:

> pak::pkg_deps('flextable') |> nrow()
[1] 59                                                                    
> pak::pkg_deps('gt') |> nrow()
[1] 57                                                                    

For the statsrv runner, installing the additional dependencies seems doable.

For actual statsrv, I was able to get gt to install, but not flextable because of a missing system dependency. I asked SCHARP TSS to see if they can add it.

kelliemac commented 1 month ago

Thanks for looking into that, @slager! Would we need to update the DESCRIPTION file or just the CI GitHub action? Dependencies still confuse me a bit.

So far I have been taking the approach of using flextable rather than gt, but if you think it will be better to switch to using gt, we could do that. I just haven't tested it out as much so it would take more time and thought.

slager commented 1 month ago

I haven't really used gt or flextable, but one attractive looking thing about gt, if it works, is that I think you can use one gt object, and then pass it at the end to gt's word/pdf output functions. So if it has all the features we need it might be simpler to implement/maintain than the mixed approach with kable/flextable. Of course, you've already implemented it with flextable, so there would be the additional cost of changing what you already have working, which might not be worth it.

One overall comment on the new table function is just that it has a ton of arguments, which makes it flexible but will make it difficult to test. Maybe we don't really need to or could just start by testing it with just the defaults?

For dependencies, for most purposes we should only need to update the DESCRIPTION file. GitHub actions should then take care of installing packages for the standard runners. Some manual tweaks will be needed for installing packages on the statsrv runner, just because of some dependency issues with that old R version I had to work around that pak:: didn't seem to be able to handle.

kelliemac commented 1 month ago

Yeah. I think flextable should in theory be able to handle pdf as well, and it's not clear to me how good gt is at Word outputs - their documentation is pretty sparse when it comes to Word.

I think you're right that it would be better to just use one package rather than switching between kable and flextable/gt. Let me try using flextable for PDF and see how well it works. I think I tried this before and had some issue but I can't remember what it was.

slager commented 1 month ago

With some pointers from SCHARP IT, I was able to just install flextable on statsrv! And, more specifically, the problem was its tricky dependency textshaping which required passing custom CPPFLAGS variables on a special admin server (which we do already have access to) in order to install it. But it works! If we end up using it, we can add to the documentation how to install that on statsrv.

kelliemac commented 1 month ago

that's great!!!

kelliemac commented 1 month ago

Noting some issues I am currently seeing with gt:

For PDF documents:

For Word documents:

kelliemac commented 1 month ago

I think flextable is more promising, but I'm getting the following LaTeX error trying to knit to PDF:

! Undefined control sequence.
<argument> \Oldarrayrulewidth 
kelliemac commented 2 weeks ago

Need to ensure flextable can do the following:

Also, add to a working example (possibly for inclusion in skeleton or vignette):

slager commented 2 weeks ago

I think the statsrv concern can be addressed by: