benmarwick / wordcountaddin

Word counts and readability statistics in R markdown documents
Other
337 stars 33 forks source link

Exclude Figures/Tables inserted using LaTeX from word count #29

Closed fschaffner closed 5 years ago

fschaffner commented 5 years ago

I would like to propose removing code used to insert latex figures and tables from the word count. It should be straightforward to implement this. Here an example using base R and stringr.

figures <- "start \begin{figure} latex code end{figure} middle \begin{figure} latex code end{figure} finish"

gsub("\begin\\{figure\\}(.*?)end\\{figure\\}", "", figures)
#> [1] "start  middle  finish"

stringr::str_remove_all(figures, "\begin\\{figure\\}(.*?)end\\{figure\\}")
#> [1] "start  middle  finish"

tables <- "start \begin{table} latex code end{table} middle \begin{table} latex code end{table} finish"

gsub("\begin\\{table\\}(.*?)end\\{table\\}", "", tables)
#> [1] "start  middle  finish"

stringr::str_remove_all(tables, "\begin\\{table\\}(.*?)end\\{table\\}")
#> [1] "start  middle  finish"
benmarwick commented 5 years ago

Thanks! Does the pkg not exclude those currently? Would you like to submit a pull request to the pkg to add this?

benmarwick commented 5 years ago

I'm curious to see a real-world document where LaTeX is mixed with R Markdown to insert figures, can you link me to an example? Just want to see what else is going on in those kinds of docs that we might need to think about for word counting. Thanks!