davidgohel / officer

:cop: officer: office documents from R
https://ardata-fr.github.io/officeverse/
Other
615 stars 107 forks source link

master_name appear as NA in read_pptx, resulting in 'attempt to apply non-function' error on ph_with #488

Open csbu00 opened 1 year ago

csbu00 commented 1 year ago

I created a simple slide template with a master template and several layouts, along with their placeholder (attached). Oddly, when I try to read_pptx, it seems to not be picking up the master template's name (it's reading as a null value)? This seems to result in error when using ph_with - I'm guessing it's because it couldn't process a null value. Have I perhaps done something wrong, either in the template creation or code, or is this a bug? Thank you!

Slide Template_R.pptx

library('tidyverse')
library('officer')

presentation <- officer::read_pptx('Slide Template_R.pptx')
template_object_map <- presentation %>% officer::layout_properties() %>% select(c('master_name','name', 'type', 'id', 'ph_label')) 

my_pres <- presentation %>% 
            remove_slide(index = 1) %>% 
            add_slide(master=NA, layout='Style_leftright_a') %>%
            ph_with(value= 'Corridors', location = ph_location_label(ph_label = "Big Title")) 
davidgohel commented 1 year ago

thanks, I can reproduce

From presentation |> layout_summary(), I can see your master is named 'Custom Design', you can use the following code

library('tidyverse')
library('officer')

presentation <- officer::read_pptx('~/Downloads/Slide.Template_R (1).pptx')
template_object_map <- presentation %>% officer::layout_properties() %>% select(c('master_name','name', 'type', 'id', 'ph_label'))

my_pres <- presentation %>%
  remove_slide(index = 1) %>%
  add_slide(master='Custom Design', layout='Style_leftright_a') %>%
  ph_with(value= 'Corridors', location = ph_location_label(ph_label = "Big Title"))

print(my_pres, target = "Custom Design.pptx")

There is an issue with template_object_map <- presentation %>% officer::layout_properties() that should not gives you NA. Thanks for reporting

csbu00 commented 1 year ago

That worked! Thank you so much!

markheckmann commented 2 months ago

Analysis

The reason is that the master slide is empty. It has no objects (shapes, pics, images etc. on it). (NB: Also first the slide layout has not objects on it.) Now, when loading the PPTX via read_pptx(), the master_xfrm data frame in the following code has zero rows.

obj$slideLayouts <- dir_layout$new( package_dir,
    master_metadata = obj$masterLayouts$get_metadata(),
    master_xfrm = obj$masterLayouts$xfrm() )

The reason is, that the slide_master's xfrm() method searches all shapes, pics etc. nodes in the master's shape trees.

xfrm = function(){
  nodeset <- xml_find_all( self$get(), as_xpath_content_sel("p:cSld/p:spTree/") )
  read_xfrm(nodeset, self$file_name(), self$name())
}

However, as there are no objects in the master slide, nodeset is an empty set. read_xfrm() will then return a dataframe with zero rows as the length(nodeset) < 1) condition is met. So, while the master name "Custom Design" is passed correctly via the name arg to read_xfrm(), it is lost for further processing.

xfrmize() later uses the master_xfrm data frame with zero rows. As this is the only source used for the master layout's name, it gets lost.

xfrmize <- function(slide_xfrm, master_xfrm) {
  x <- as.data.frame(slide_xfrm)

  master_ref <- unique(data.frame(
    file = master_xfrm$file,
    master_name = master_xfrm$name,
    stringsAsFactors = FALSE
  ))
  master_xfrm <- fortify_master_xfrm(master_xfrm)

Suggestion

I tried catching the edge case by adding the following to read_xfrm().

read_xfrm <- function(nodeset, file, name){
  if( length(nodeset) < 1 && !( is.na(name) || is.null(name)) ) {
    return(data.frame(stringsAsFactors = FALSE,
                      type = NA_character_,
                      id = NA_character_,
                      ph_label = NA_character_,
                      ph = NA_character_,
                      file = basename(file),
                      offx = NA_integer_,
                      offy = NA_integer_,
                      cx = NA_integer_,
                      cy = NA_integer_,
                      rotation = NA_integer_,
                      name = name,
                      fld_id = NA_character_,
                      fld_type = NA_character_
    ))
  }

However, I do not really oversee the consequences of this and one test also fails.

I think having zero objects on a master is quite a rare edge case. The simplest would be simply to warn the user in this case.

@davidgohel What do you think?