JanMarvin / openxlsx2

openxlsx2 - read, write and modify xlsx files
https://janmarvin.github.io/openxlsx2/
Other
112 stars 11 forks source link

Is it possible to read and change existing form controls (e.g., checkboxes)? #1077

Closed eipi10 closed 1 month ago

eipi10 commented 1 month ago

I'm writing an R package to automate the process of filling in an Excel template workbook for the Common Data Set, which is a standardized set of data that most U.S. colleges and universities provide each year (download link for Excel template workbook).

Some of the requested information involves checkboxes or dropdowns. I'd like to be able to determine the locations of the checkboxes and then check them or leave them unchecked, as appropriate. Likewise, I'd like to identify the locations of the dropdowns and select the correct value in each one.

I see that openxlsx2 has the wb_add_form_control() function to create checkboxes and dropdowns. But I can't see a way to identify or change existing form controls. If this functionality already exists, how can I access it? If this functionality doesn't exist, please consider this a feature request.

To make this concrete, here's an example using the Excel workbook linked above. Let's say I want to fill in the checkboxes in worksheet "CDS-F."

wb = wb_load("CDS_UNL2_2023_2024.xlsx")
wb = wb_set_active_sheet(wb, "CDS-F")

If I look at the file, I can see that there's a checkbox in cell A27:

Screenshot 2024-07-11 at 9 27 49 AM

Just as a test, I added another checkbox on top of this one:

wb = wb %>% wb_add_form_control()
wb$add_form_control(sheet="CDS-F", dims="A27", type="Checkbox", 
                    text="Campus Ministries", checked=TRUE)
wb_save(wb, "CDS-update.xlsx") 

Which gives the following:

Screenshot 2024-07-11 at 9 28 17 AM

Although it's not optimal, I thought maybe I could at least erase the existing checkbox and add a new one, but wb = wb_clean_sheet(wb, dims="A27") didn't remove the existing checkbox and I didn't see another way of doing that. In any case, ideally, I'd be able to programmatically identify the locations of checkboxes along with their explanatory text and then add a check mark to them as appropriate. Likewise, I'd like to be able to identify dropdowns and select the appropriate values.

JanMarvin commented 1 month ago

Hi @eipi10 , right now this is not possible. It should be possible to select vml objects from R, but the effects might be different from what you see in spreadsheet software. E.g. we do not evaluate data, hence setting a drop down list to a specific value might only cause confusion for the spreadsheet file. But I will look into this. Shipping examples is also a big help, because I have something to toy around with.

eipi10 commented 1 month ago

Thanks. Let me know if I can help in any way. I don't really know much about the structure of Excel files, but I'm happy to test out options.

JanMarvin commented 1 month ago

You should see a couple of vml objects once you import the file. wb$vml, similar to how we create them here:

https://github.com/JanMarvin/openxlsx2/blob/a3c2f1dd1fb2467b2f7fb7bdc4475ceb9b192f59/R/class-workbook.R#L6185-L6306

JanMarvin commented 1 month ago

It should be possible to scrape the vml code for checked and unchecked boxes. But I’m not sure how this will work out. I’m currently traveling, will look into it next week

JanMarvin commented 1 month ago

@eipi10 find below a function that reads the values, shows the cells and checks the values.

The issue is a bit non trivial, because form control objects are basically overlays on the spreadsheet. In your case, the checkbox itself must not be modified, instead a value on the sheet "multi select" must be set. In the code below I read the text from the checkboxes, their link address, and their position on the sheet (this works, though I'm a little surprised why ...).

Can you give this a try if it fulfills your needs?

library(openxlsx2)

fl <- "https://commondataset.org/wp-content/uploads/2023/11/CDS_UNL2_2023_2024.xlsx"

wb <- wb_load(fl)

sheet_name <- "CDS-F"

wb$set_active_sheet(sheet_name)
wb$set_selected(sheet_name)

sheet <- wb$validate_sheet(sheet_name)
vmls  <- wb$worksheets[[sheet]]$relships$vmlDrawing

xml_nodes <- xml_node(wb$vml[[vmls]], "xml", "v:shape")

get_infos <- function(x) {

  z <- data.frame(
    txt    = vector("character", length(x)),
    # anchor = vector("character", length(x)),
    link   = vector("character", length(x)),
    x_loc  = vector("integer", length(x)),
    y_loc  = vector("integer", length(x)),
    dims   = vector("character", length(x))
  )

  for (i in seq_along(x)) {
    txt <- xml_node(x[i], "v:shape", "v:textbox")
    # text
    txt <- xml_value(txt, "v:textbox", "div", "font")
    z$txt[i] <- gsub("   ", " ", txt)

    loc <- xml_value(x[i], "v:shape", "x:ClientData", "x:Anchor")
    # z$anchor[i] <- loc

    link <- xml_value(x[i], "v:shape", "x:ClientData", "x:FmlaLink")
    z$link[i] <- link

    loc <- sapply(strsplit(loc, ","), "[")
    loc <- as.integer(loc)

    # right and bottom positions from anchor
    z$x_loc[i] <- loc[[5]] + 1L
    z$y_loc[i] <- loc[[7]]

    z$dims[i]  <- wb_dims(cols = z$x_loc[i], rows = z$y_loc[i])
  }

  z
}

infos <- xml_nodes %>% get_infos()
head(infos)
#>                                  txt                link x_loc y_loc dims
#> 1                  Campus Ministries 'multi select'!$X$2     1    27  A27
#> 2                      Choral groups 'multi select'!$X$3     1    29  A29
#> 3                       Concert band 'multi select'!$X$4     1    31  A31
#> 4                              Dance 'multi select'!$X$5     1    33  A33
#> 5                      Drama/theater 'multi select'!$X$6     1    35  A35
#> 6 International Student Organization 'multi select'!$X$7     1    37  A37

# example node
drama <- which(infos$txt == "Drama/theater")
xml_nodes[[drama]] %>% as_xml()
#> <v:shape id="_x0000_s14341" type="#_x0000_t201" style="position:absolute;  margin-left:86pt;margin-top:700pt;width:120pt;height:30pt;z-index:5;  mso-wrap-style:tight" filled="f" fillcolor="white [65]" stroked="f" strokecolor="black [64]" o:insetmode="auto">
#>  <v:path shadowok="t" strokeok="t" fillok="t" />
#>  <o:lock v:ext="edit" rotation="t" />
#>  <v:textbox o:singleclick="f">
#>   <div style="text-align:left">
#>    <font face="Calibri" size="240" color="#000000">Drama/theater</font>
#>   </div>
#>  </v:textbox>
#>  <x:ClientData ObjectType="Checkbox">
#>   <x:SizeWithCells />
#>   <x:Anchor>    0, 86, 33, 9, 0, 206, 35, 7</x:Anchor>
#>   <x:Locked>False</x:Locked>
#>   <x:AutoFill>False</x:AutoFill>
#>   <x:AutoLine>False</x:AutoLine>
#>   <x:TextVAlign>Center</x:TextVAlign>
#>   <x:FmlaLink>'multi select'!$X$6</x:FmlaLink>
#>   <x:NoThreeD />
#>  </x:ClientData>
#> </v:shape>

# begin little help from chatgpt
split_reference <- function(reference) {
  # Extract the sheet name
  sheet_name <- sub("^(.*)!.*$", "\\1", reference)
  sheet_name <- gsub("^'|'$", "", sheet_name)  # Remove surrounding single quotes if they exist

  # Extract the cell reference
  cell_ref <- sub("^.*!(.*)$", "\\1", reference)
  cell_ref <- gsub("\\$", "", cell_ref)  # Remove dollar signs

  return(list(sheet_name = sheet_name, cell_ref = cell_ref))
}
# end little help from chatgpt

to_modify <- infos[drama, "link"] %>% split_reference()

# set multiselect link to TRUE
wb$add_data(
  sheet = to_modify$sheet_name,
  dims = to_modify$cell_ref,
  x = TRUE,
  col_names = FALSE
)

if (interactive()) wb$open()
JanMarvin commented 1 month ago

Similar for drop downs in this file:

sheet_name <- "CDS-A"
sheet <- wb$validate_sheet(sheet_name)
drop_down_x14 <- wb$worksheets[[sheet]]$extLst %>% xml_node("ext", "x14:dataValidations", "x14:dataValidation")

# drop_down_x14 %>% as_xml()

get_drop_downs <- function(x) {

  z <- data.frame(
    refs  = vector("character", length(x)),
    link  = vector("character", length(x))
  )

  for (i in seq_along(x)) {
    z$refs[i] <- xml_value(x[i], "x14:dataValidation", "xm:sqref")
    z$link[i] <- xml_value(x[i], "x14:dataValidation", "x14:formula1", "xm:f")
  }

  z
}

drops <- drop_down_x14 %>% get_drop_downs()

to_modify <- drops[drops$refs == "D20", "link"] %>% split_reference()

wb_to_df(wb, sheet = to_modify$sheet_name, dims = to_modify$cell_ref, col_names = FALSE)
#>     A
#> 1 Yes
#> 2  No

if (interactive()) wb$open()
eipi10 commented 1 month ago

Wow, thanks! I will try these and get back to you.

eipi10 commented 1 month ago

I tested the checkbox code and it works well! As a first try at automation, I used your infos data frame to create a separate Excel worksheet listing all of the checkboxes in the CDS-F worksheet (checkbox-choices.xlsx). Then I added an "x" next to the ones I want "checked" in the CDS template workbook. Finally, I looped over these values to modify the CDS template workbook.

library(tidyverse)
library(openxlsx2)

# Run Jan Marvin's setup code
wb <- wb_load(fl)
sheet_name <- "CDS-F"

wb$set_active_sheet(sheet_name)
wb$set_selected(sheet_name)

sheet <- wb$validate_sheet(sheet_name)

vmls  <- wb$worksheets[[sheet]]$relships$vmlDrawing

xml_nodes <- xml_node(wb$vml[[vmls]], "xml", "v:shape")

infos <- xml_nodes %>% get_infos()

# eipi10's test code

# # Setup: Create a separate Excel file where we fill in checkbox choices
# # No need to run this. Instead, see attached file checkbox-choices.xlsx
# wb.choices = wb_workbook()
# wb.choices = wb_add_worksheet(wb.choices, "CDS-F")
# infos.to.write = infos %>% select(txt) %>% mutate(`Add 'x' if applicable`="")
# wb.choices = wb_add_data(wb=wb.choices, x=infos.to.write)
# wb_save(wb.choices, "checkbox-choices.xlsx")

# Load the checkbox choices and use them to determine which checkboxes to "check"
choices = wb_read("checkbox-choices.xlsx", sheet="CDS-F") %>% 
  filter(`Add 'x' if applicable` != "")
infos.selected = inner_join(infos, choices)

# Update the template spreadsheet to "check" the appropriate checkboxes
for(i in 1:nrow(infos.selected)) {

  to_modify = infos.selected[i, "link"] %>% split_reference()

  # set multiselect link to TRUE
  wb = wb_add_data(
    wb = wb,
    sheet = to_modify$sheet_name,
    dims = to_modify$cell_ref,
    x = TRUE,
    col_names = FALSE
  )

}

if (interactive()) wb$open()

I'm not sure exactly how I want this to work for users of the package (right now it's only for internal use, but if I can sufficiently generalize it, I will make it available for other institutions that might want to use it), but I need to provide some way to for them to select the boxes they want checked. Maybe that can be done interactively, but for now, a separate spreadsheet listing all the checkbox options seemed like an easy way to get started.

Also, I can see I'm going to need to add a bit more logic to wrap around the checkbox identifications to address redundancies. For example, section F3 in the CDS-F worksheet has three checkboxes with the same text ("At cooperating institution"), so I'll need to figure to disambiguate situations like that. The "multi select" worksheet uses unique names in column Z, so maybe I just need to get the names from there for this case.

Regarding the use of a separate sheet (the "multi select" worksheet in this case) for dealing with the checkboxes: I've never used this type of logic before in a spreadsheet, where clicking a checkbox on one sheet changes a cell value somewhere else and vice versa. Your approach of modifying the checkboxes by modifying the corresponding logical values in the "multi select" worksheet works fine. But I'm curious why you can't modify the checkboxes directly with R/vml code. On the spreadsheet itself, the logic works in both directions: If I change a logical value from FALSE to TRUE in "multi select" the corresponding checkbox becomes checked. On the other hand, I can also check a checkbox and the corresponding logical cell in "multi select" changes from FALSE to TRUE.

In any case, thanks very much for this code. I will also test out the dropdown code and report back.

JanMarvin commented 1 month ago

Also, I can see I'm going to need to add a bit more logic to wrap around the checkbox identifications to address redundancies. For example, section F3 in the CDS-F worksheet has three checkboxes with the same text ("At cooperating institution"), so I'll need to figure to disambiguate situations like that. The "multi select" worksheet uses unique names in column Z, so maybe I just need to get the names from there for this case.

Since the cells are empty in the sheet, you could create a duplicate of the workbook, fill the cells on the sheet with the corresponding checkbox texts and return the output of the new workbook with wb_to_df(). But obviously this is maybe not the best of all options :)

Regarding the use of a separate sheet (the "multi select" worksheet in this case) for dealing with the checkboxes: I've never used this type of logic before in a spreadsheet, where clicking a checkbox on one sheet changes a cell value somewhere else and vice versa.

These checkboxes are only objects reflecting the state of the value. If the value is TRUE the box is checked. There are two ways for checkboxes, those that have a state, but are not assigned to a value and, well this other type, where the value is . It modifies the value below.

In any case, thanks very much for this code. I will also test out the dropdown code and report back.

Your welcome, I didn't have much time tonight, let me know if you need further input. I'm going to convert this into a discussion, because I'm still not really sure how convert this into a general function, and I personally have no use case for this.