Add utility functions for converting nested `xml_document` objects to lists/data frames

I started looking at this as an example: https://rud.is/rpubs/xml2power/

Here is the code I started working on based on that blog post:

xtrct_text <- function(doc, target) {
  xml2::xml_find_all(doc, target) |>
    xml2::xml_text() |>
    trimws()
}

xtrct_attr <- function(doc, target) {
  xml2::xml_find_all(doc, target) |>
    xml2::xml_attrs()
}

xtrct_df <- function(doc, top, type = "attr") {
  xml_children <- xml2::xml_find_first(doc, sprintf(".//%s", top)) |>
    xml2::xml_children()

  xml_children |>
    xml2::xml_name() |>
    purrr::map(
      function(x) {
        content <- switch(type,
                          attr = as.list(xtrct_attr(doc, sprintf(".//%s/%s", top, x))),
                          text = xtrct_text(doc, sprintf(".//%s/%s", top, x))
        )

        content
        # rlang::set_names(
        #   list(content),
        #   tolower(x)
        # )
      }
    ) # |>
  # purrr::flatten_df() #|>
  # readr::type_convert()
}

I need to review the IDML specs to figure out when/what info is stored in attributes or tag names and when/what info is stored in text and what flags I can use to determine how deeply nested the XML structure is for a given node.

elipousson / idmlr

Add utility functions for converting nested `xml_document` objects to lists/data frames #3