dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
504 stars 77 forks source link

How to get header elements into toc in rmarkdown document? #126

Closed ottadini closed 3 years ago

ottadini commented 3 years ago

I use lapply to run a dfSummary on many data frames. I have a simple function that reads in a table from file and runs print(dfSummary()), but I can't seem to get the right combination of options to get markdown produced for knitr to knit to html.

If I use results='asis' along with style='multiline' then I get closest, but the output of print is html, which knitr can't work with for creating a toc. I can't seem to get method='render' to work properly.

Also, style='grid' is very very slow, much slower than style='multiline'. Writing images to the temp dir takes ages.

Showing the basics below of my Rmd file.

---
output:
    rmarkdown::html_document:
        toc: true
---

```{r functions, echo=FALSE}
library(summarytools)
# options for summarytools
st_options(plain.ascii = FALSE, tmp.img.dir = "temp")

summarise_dfs <- function(fpath) {
    df <- read.table(fpath)

    print(
        dfSummary(df, valid.col = FALSE, varnumbers = FALSE, style = 'multiline'),
        method = 'render'
    )
}
```

# Summaries

```{r summaries, echo=FALSE, results='markdown'}
files_to_check <- list.files('.', pattern = "txt$", full.names = TRUE)
lapply(X = files_to_check, FUN = summarise_dfs)
```
ottadini commented 3 years ago

I have played with the options and have decided to go for a combination of print(dfSummary(df, style='multiline'), method='pander'), and results='asis'. The tables aren't styled at all and I lose the graphics, but I do get table of contents links.

However, in my table of contents I see a list of 20 links to "Data Frame Summary" because the summaries for each dfSummary are labelled with the header text "Data Frame Summary" in <h3> or ###. Can this also be an editable label?

image

dcomtois commented 3 years ago

Hi @ottadini ,

For the "Data Frame Summary", you could change it using define_keywords(), but it would still always be the same for all... What would you like to show ideally?

I wonder why the grid style is slow like that... Did you try installing the dev-current branch from github?

ottadini commented 3 years ago

Hi @dcomtois thanks for the reply. For me I'd like to have "Data frame summary" assignable like the Data.frame optional arg so I can print out the name of the data frame or text file that I have read in. Or have the Data.frame arg or Data.frame.label arg promoted to a higher header level? Ideally I'd like to see the name of the data frame as the header instead of a generic label like 'Data frame summary".

dcomtois commented 3 years ago

Hi @ottadini,

This makes sense. I'll think of something.

I'm still curious about the speed issue with style="grid", could you tell me which version of Summarytools you were working with when this happened? (CRAN version? GitHub's master branch, or maybe dev-current branch?

ottadini commented 3 years ago

I've tried the newer github version now, but no change in the issues I'm facing.

With the grid style option in print(dfSummary()), in a results='asis' chunk, the layout is not correct. I have to use multiline to get correct layout.

Here's with multiline: image

And here's with grid: image

This should be a new issue shouldn't it?

dcomtois commented 3 years ago

Yes if you could open a new issue with a small reproducible example and the output using style='grid', that would be very helpful, thanks!

Did you notice any change in speed using the github version?

dcomtois commented 3 years ago

One thing I forgot to ask... Did you call st_css() in a chunk with option echo=FALSE, as described in the rmarkdown recommendations vignette?

ottadini commented 3 years ago

I did notice a bit of improvement yes! I should have timed it before I upgraded. I'm calling st_css() from a chunk that also has knitr::opts_chunk$set(echo = FALSE), which I think achieves the same thing?

Edit: I've changed the chunk options to echo=FALSE, include=TRUE and some other variants and i can see how it affects it. The style sheet is being incorporated properly I think.

dcomtois commented 3 years ago

Ok, and did it fix the tables? One way to check if the css is there is opening the source of the html file. You should see a bunch of css classes starting with .st_

ottadini commented 3 years ago

Yes the tables are correct and I can see the st css in the source, though the tables are very very plain. I added some css to get banded rows and a couple of horizontal rules for the header row and bottom row.

dcomtois commented 3 years ago

Yes the tables being plain have to do with the theme... You could try st_css(bootstrap=TRUE)... but this will affect your whole page. To avoid that (if it's what you want), there is a way to create specific classes in a custom css file and use it with the print method as detailed in the docs.

dcomtois commented 3 years ago

Hi @ottadini,

I was thinking about the TOC issue. It didn't occur to me to say that before, but there is already a way to go around this... It just involves one more step and using Data.frame = NULL to avoid having redundant information :

library(summarytools)
define_keywords(title.dfSummary = "Dataframe Summary for Iris")
print(dfSummary(iris), Data.frame = NULL, plain.ascii = FALSE)

##   ### Dataframe Summary for Iris  
##   **Dimensions:** 150 x 5  
##   **Duplicates:** 1  
##
dcomtois commented 3 years ago

Closing for now, if this is still a problem just reply and I can reopen, Thx.