davidgohel / rvg

https://davidgohel.github.io/rvg/
131 stars 15 forks source link

R markdown workflow for PPT #26

Open noamross opened 5 years ago

noamross commented 5 years ago

The development version of pandoc now allows passing of raw openxml into powerpoint: https://github.com/jgm/pandoc/commit/5547cd6000a061a74aee22ec41b238b9a6c0f920

Using this, I've been able to make a quick proof-of-concept of creating editable powerpoints through R Markdown:

---
output:
  powerpoint_presentation:
    keep_md: true
    md_extensions: "+raw_attribute"
---

    ```{r setup, include=FALSE}
    library(rvg)
    library(knitr)
    opts_chunk$set(echo = FALSE)
    local({
      hook_plot = knit_hooks$get('plot')
      knit_hooks$set(plot = function(x, options) {
        x = paste(x, collapse = '.')
        if (!grepl('\\.dml', x)) return(hook_plot(x, options))
        xx <- xml2::read_xml(x)
        # read the content of the rvg image and write it out without <?xml ... ?>
        paste(c("```{=openxml}",
                strsplit(as.character(xx), "\n")[[1]][-1],
                "```"),
              collapse = '\n')
      })
    })

Slide 1

```{r, echo=FALSE, dev="dml_pptx", fig.ext="dml", dev.args=list(standalone=FALSE)}
plot(pressure)
```

(code blocks indented here to keep formatting)

I can probably put in a bit of work to try to make this a smooth workflow.  There are several ways to approach this: PR into **knitr** an option for an `rvg` device, make a new rmarkdown format like `rvg_powerpoint`, or create an object type like `rvg_plot` that has `knit_print` methods to print raw `openxml` blocks.  I figure the latter is the most flexible and will require the least of the user.  The actual rendering can delayed until `knit_print` is called so that the method can choose dml_docx/dvg_pptx as appropriate, and fall back to a different graphics device for older pandoc/other output formats.  I think one could use **officer** to read the PPTX template to set the plot properties correctly.

Any thoughts on the best approach? Would you take a PR to **rvg** for `knit_print()` methods? 
davidgohel commented 5 years ago

Hi @noamross

Sure, I would be happy! I don't have much time available for that but I will do my best to help you if necessary.

Maybe you can get help from this implementation: https://github.com/davidgohel/flextable/blob/d1bbbf6729100d06e0b35574911372391148ab99/R/printers.R#L94

I agree with you. A knitr_print method in rvg is a good option.

There maybe a difficulty when the plot contains raster as it generate png files to be registered from dir ppt/media. I have made worded that deals with that but it forces the user to use a new rmarkdown template. I have used your work to do that ;). The interested bit of code is there

I'd like to avoid Word output. The device can not really be valid as Word ML does not allow real text boxes and the position is never really what it should be. https://github.com/davidgohel/rvg/blob/master/R/dml_docx.R#L30 (... I did not remove it from the package)

noamross commented 5 years ago

Great. It'll take a couple of weeks but I'll get to this. Avoiding Word output makes sense. I could see a system with a series of "fall-back" devices but for now I'll keep this simple.

I could actually see a system like what you have in worded along the lines of how htmlwidgets register Javascript dependencies. Office-specific objects could register post-processing functions they require after conversion. This would require some overarching Rmd types that could handle the extensions. (Maybe officer_docx and officer_pptx) But I'm just speculating now - I need to get a simple version of this done for an upcoming project!