awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

data validation type 'list' fails when there are more than one on a sheet #266

Open PeterVermont opened 7 years ago

PeterVermont commented 7 years ago

[re-writing summary since it was too vague]

With version 4.08 from GitHub you can easily create a corrupted file with this code:

library(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, "Sheet 1")
addWorksheet(wb, "Sheet 2")

writeDataTable(wb, sheet = 1, x = iris[1:30,])
writeData(wb, sheet = 2, x = sample(iris$Sepal.Length, 10))

## list validation pointing to sheet 2
#if you only call dataValidation once it works but if more than once gets Excel corrupt message
for (row in 2:3) {
  dataValidation(
    wb,
    1,
    col = 1,
    rows = row,
    type = "list",
    value = "'Sheet 2'!A1:A10"
  )
}

save <- FALSE
if (save) {
  my_file <- 'openxlsx_data_validation_output.xlsx'
  saveWorkbook(wb, my_file, overwrite = TRUE)
  openXL(my_file)
} else {
  openXL(wb)
}
awalker89 commented 7 years ago

This works for me running version 4.0.11 and Excel 2013. Can you please try updating to the latest dev version with devtools::install_github("awalker89/openxlsx"). What version of Excel are you using?

PeterVermont commented 7 years ago

If you copy the example above exactly you can see that the error still exists but it is due to the use of col instead of cols.

PeterVermont commented 7 years ago

Actually I was wrong -- it is weird that passing 'col' does not create an error but even if I pass 'cols' I get an error. This is with Excel 2016 on MacOS although my original bug was on Windows.

I have saved one 'good' file created by looping only once:

for (row in 2:2) {

And one 'bad' file created by looking twice:

for (row in 2:3) {

openxlsx_data_validation_output_bad.xlsx openxlsx_data_validation_output.xlsx

EagleR17 commented 7 years ago

I have run into this same issue and I've yet to figure out why after looking at the source code. I'm running version 4.0.17 on Excel 2016. My system set-up is a 64 bit windows machine running R 3.3.3.

EDIT: I have resolved this issue. I will create a pull request after some testing to confirm the solution. But initial tests of updated code have it working for me now.

tkunstek commented 6 years ago

I tripped on this bug again today. I could not find a pull request for #266 by @EagleR17.

When a second dataValidation statement is added to the same sheet, openxlsx generates corrupted XML. When viewing the recovery log you see the following:

`<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Repair Result to PRAPHAEL@allstate.com1.xmlErrors were detected in file 'filename.xlsx'Replaced Part: /xl/worksheets/sheet1.xml with XML error. Load error. Line 2, column 458.

`

This is the XML in question after running through pretty print: screen shot 2017-11-13 at 2 33 10 am

tkunstek commented 6 years ago

If I unzip the xlsx, edit the above XML and reduce it to one <ext><x14:dataValidations count=2" ... </x14:dataValidations></ext> the file opens and is usable.

EagleR17 commented 6 years ago

I still need to create a pull request for this, I just need to clean my code up since I've added other features specific to a project I was working on that would be useless for the community at large.

However, in an effort to help you out, here is what I ended up doing. I had to update two scripts in the code, the WorkbookClass.R script and the worksheet_class.R script.

From WorkbookClass.R update the Workbook$methods for dataValidation_list function as follows:

Workbook$methods(dataValidation_list = function(sheet, startRow, endRow, startCol, endCol, value, allowBlank, showInputMsg, showErrorMsg){ sheet = validateSheet(sheet) sqref <- paste(getCellRefs(data.frame("x" = c(startRow, endRow), "y" = c(startCol, endCol))), collapse = ":") data_val <- sprintf('<x14:dataValidation type="list" allowBlank="%s" showInputMessage="%s" showErrorMessage="%s">', allowBlank, showInputMsg, showErrorMsg, sqref) formula <- sprintf("<x14:formula1><xm:f>%s</xm:f></x14:formula1>", value) sqref <- sprintf('<xm:sqref>%s</xm:sqref>', sqref) xmlData <- paste0(data_val, formula, sqref, '</x14:dataValidation>') worksheets[[sheet]]$extLst <<- c(worksheets[[sheet]]$extLst, xmlData) invisible(0) })

In worksheet_class.R script I updated the WorkSheet$methods function from:

if(length(extLst) > 0) xml <- paste0(xml, sprintf('<extLst>%s</extLst>', paste(extLst, collapse = "")), collapse = "") xml <- paste0(xml, "</worksheet>") return(xml)

to:

if(length(extLst) > 0) xml <- paste0(xml, sprintf('<extLst><ext uri="{CCE6A557-97BC-4b89-ADB6-D9C93CAAB3DF}" xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main"><x14:dataValidations count="%s" xmlns:xm="http://schemas.microsoft.com/office/excel/2006/main">', length(extLst)), paste0(pxml(extLst), '</x14:dataValidations></ext></extLst>'), collapse = "") xml <- paste0(xml, "</worksheet>") return(xml)

tkunstek commented 6 years ago

Thanks for the code. Tested and works great. Pull request above.

chicofish commented 6 years ago

I'm still getting this error on Excel 2016 for Windows. Has this pull request been incorporated into the CRAN build or the master build here?

stkrogs commented 6 years ago

Apparently the pull request from tkunstek has not been incorporated here yet. I have used tkunstek's fork directly by installing it from his github: >devtools::install_github("tkunstek/openxlsx", dependencies=TRUE) It solves my problems with using more than one data validation per sheet. Hope this helps, and thanks to tkunstek and EagleR17 for fixing this :-)

Benrath commented 5 years ago

Will this be integrated in the cran Version soon?

aavanesy commented 5 years ago

Hi, Same here. I am currently using the forked version. Any update on the merge?

eoziolor commented 4 years ago

Any updates on this merge?

ycphs commented 4 years ago

I included the code in the new repository.

I managed to take over the maintenance of the package. I am looking forward to new improvements to include in the package.

please use now the new repository to for pull requests.

bsleik commented 4 years ago

Can the binary be updated as well? Thanks. There is a binary version available but the source version is later: binary source needs_compilation openxlsx 4.1.0 4.1.3 TRUE

ycphs commented 4 years ago

The new repository for the maintenance of the package is new repo.

I am looking forward to your contributions.

daomeidan1234 commented 2 years ago

I am still having the issue with Openxlsx 4.2.5, R 4.0.3.

Strangely, the first time openxlsx::saveWorkbook() ok, the second time always make it corrupt, even if don't write and data into the file, just simply loadworkbook, then saveWorkbook, will corrupt the file.

Since the excel file is made by others, how I can check which cell may have the validation , which will corrupt the file

bsleik commented 2 years ago

There are a number of bugs that cause corruption beyond just this one when working with editing pre-existing workbooks. The function I developed below will open the workbook fixing those issues and should let you save it without corruption.

loadFixWorkbook <- function(importFile) { wb = loadWorkbook(importFile)

worksheets = names(wb)

 #** fix any broken autofilters

for (i in 1:length(worksheets)) { if (length(wb[["worksheets"]][i][[1]][["autoFilter"]]) != 0) { xx <- wb[["worksheets"]][i][[1]][["autoFilter"]]

  for (iii in 1:length(xx)) {
    x = xx[iii]
    if (!str_detect(x,'^<autoFilter ref="\\w{1,3}\\d{1,8}:\\w{1,3}\\d{1,8}"[\\w\\d\\=\\:\\-\\" \\{\\}]{0,1000}\\/>')) {
      tags = str_extract_all(x,'<\\w{1,}[\\w\\d\\=\\:\\-\\" \\/\\{\\}\\.]{0,1000}>')[[1]]
      if (length(tags) > 0) {
        for (ii in length(tags):1) {
          #showText(paste0("checking tag ",tags[ii]))
          if (!str_detect(tags[ii],"/>")) {
            tag <- paste0("</",substr(str_extract(tags[ii],'<\\w{1,}'),2,1000),">")
            if (!str_detect(x,tag)) {
              # add missing end tag
              x <- paste0(x,tag)
            }
          }
        }
      }
    }

    xx[iii] = x
    #showText(x)
  }

  wb[["worksheets"]][i][[1]][["autoFilter"]] <- unique(xx)

}

# tags to remove xr references
xrRemoveTags = c("autoFilter","dataValidations","extLst")
for (xrRemoveTag in xrRemoveTags) {
  if (length(wb[["worksheets"]][i][[1]][[xrRemoveTag]]) != 0) {
    xx <- wb[["worksheets"]][i][[1]][[xrRemoveTag]]

    for (iii in 1:length(xx)) {
      x = xx[iii]

      # remove any xr:uid references as they also corrupt it because missing schema definition in header
      if (str_detect(x,'xr\\:uid')) {
        x = str_replace_all(x,'xr\\:uid="\\{[\\d\\w\\-]{0,100}\\}"',"")
      }
      if (str_detect(x,'xmlns\\:xlrd2')) {
        x = str_replace_all(x,'xmlns:xlrd2="[\\d\\w\\./:]{0,100}"',"")
      }
      xx[iii] = x

    }

    wb[["worksheets"]][i][[1]][[xrRemoveTag]] <- unique(xx)
  }
}

# fix corrupt sheetPr
if (length(wb$worksheets[i][[1]]$sheetPr) != 0) {
  x <- wb$worksheets[i][[1]]$sheetPr

  if (str_detect(x,'^<sheetPr><sheetPr')) {
    #showText("removing sheetPr") #remove
    wb$worksheets[i][[1]]$sheetPr <- str_extract(x,"<sheetPr .*[/]>")
  }
}

}

return(wb) }

Gianbla commented 1 year ago

Hello everyone!

I am facing a similar error when trying multiple conditional formatting in the same sheet. I am trying for different cell in the same column to have different data validation list (what in excel would be called dynamic data validation).

The coding seem correct to me, and the file is not corrupt, but for some reason the difference between the two data validation works only once, and then only one of the two is kept.

options<- data.frame(
"agriculture_cultive_now" = c("oui", "non" ,"nesaitpas" ,"neseprononcepas" ),
 "agriculture_evolution"  = c("bien_meilleure", "un_peu_meilleure", "identique" ,"un_peu_pire"))

cleaning_log<- data.frame(
"uuid" = c(1:10), 
"variable" = c("agriculture_cultive_now","agriculture_evolution", "agriculture_cultive_now","agriculture_evolution","agriculture_evolution", "agriculture_cultive_now","agriculture_evolution", "agriculture_evolution", "agriculture_cultive_now","agriculture_evolution"),
 "options" = rep(NA,10) )

wb = createWorkbook()
# Add worksheet
addWorksheet(wb, "cleaning_log")
# Add dataframe 
writeData(wb, sheet = "cleaning_log", x = cleaning_log, startCol = 1)
# Add worksheet "Drop-down values" to the workbook
addWorksheet(wb, "Drop-down values")
# Add drop-down values dataframe to the sheet "Drop-down values"
writeData(wb, sheet = "Drop-down values", x = options, startCol =
            1)
# Add drop-downs
dataValidation(wb, "cleaning_log", col = 3, rows = (which(cleaning_log$variable =="agriculture_evolution")+1), type = "list", value =
                 "'Drop-down values'!$B$2:$B$5")
dataValidation(wb, "cleaning_log", col = 3, rows = (which(cleaning_log$variable =="agriculture_cultive_now")+1), type = "list", value =
                 "'Drop-down values'!$A$2:$A$5")

# Save workbook
saveWorkbook(wb, "try.xlsx", overwrite = TRUE)