awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
365 stars 78 forks source link

Bug: `openxlsx::loadWorkbook()` corrupts loaded xlsx file #464

Open rkrug opened 5 years ago

rkrug commented 5 years ago

Expected Behavior

wb <- openxlsx::loadWorkbook("./tmp.xlsx")
openxlsx::openXL(wb)

should open the Excel file

Actual Behavior

wb is corrupted and Excel can only merely fix it with lot's of missing data. LibreOffice and Pages can't fix it.

openxlsx::saveWorkbook(wb, "./tmp_new.xlsx, overwrite = TRUE)

results in attached corrupt excel file tmp_new.xlsx

Steps to Reproduce the Problem

tmp.xlsx

wb <- openxlsx::loadWorkbook("./tmp.xlsx")
openxlsx::openXL(wb)

sessionInfo()

> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin17.7.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Zurich               
 date     2019-04-09                  

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                             
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.5.3)                     
 backports     1.1.3   2018-12-14 [1] CRAN (R 3.5.1)                     
 callr         3.2.0   2019-03-15 [1] CRAN (R 3.5.3)                     
 cli           1.1.0   2019-03-19 [1] CRAN (R 3.5.3)                     
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)                     
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)                     
 devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)                     
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)                     
 fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)                     
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.5.3)                     
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)                     
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)                     
 openxlsx      4.1.1   2019-04-09 [1] Github (awalker89/openxlsx@ead0038)
 packrat       0.5.0   2018-11-14 [1] CRAN (R 3.5.2)                     
 pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)                     
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)                     
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)                     
 processx      3.3.0   2019-03-10 [1] CRAN (R 3.5.2)                     
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.5.2)                     
 R6            2.4.0   2019-02-14 [1] CRAN (R 3.5.2)                     
 Rcpp          1.0.1   2019-03-17 [1] CRAN (R 3.5.3)                     
 remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)                     
 rlang         0.3.2   2019-03-21 [1] CRAN (R 3.5.3)                     
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)                     
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)                     
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)                     
 testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)                     
 usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)                     
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)                     
 zip           2.0.1   2019-03-11 [1] CRAN (R 3.5.3)                     

[1] /usr/local/lib/R/3.5/site-library
[2] /usr/local/Cellar/r/3.5.3/lib/R/library
> 
SophiaLC commented 5 years ago

I believe that I am having the same or a similar problem. I am not able to export the workbook without getting Excel errors: "We found a problem with some content in 'filename.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes." When I click Yes: "Excel cannot open the file 'filename.xlsx' because the file format or the file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file."

My work-around is to export the file without a file extension, then add the file extension. Excel is then able to open the file with no problem.

Thank you!

rkrug commented 5 years ago

Hi Sophia,

thanks for the tip, but I tried that, and still the same error.

HJAllen commented 5 years ago

Just made comment on https://github.com/awalker89/openxlsx/issues/348#issuecomment-492382870 but probably should have placed my comment here as the issue is more recent. Sorry for duplication.

I can report similar. When opening a workbook with simple text entries, adding data to it, and previewing, I get the "We found a problem..." dialogue. Click yes to repair and get: Removed Part: /xl/styles.xml part with XML error. (Styles) Undeclared prefix. Line 1, column 1541. Repaired Records: Cell information from /xl/worksheets/sheet1.xml part Used Range is A1:U14

dat <- data.frame(FldID = c(190022L, 190002L, 190011L, 190050L))
  rm(wb)
  xlF <- file.path(path$wd_docs,
                   paste0("TTEB_.xlsx"))
  wb <- openxlsx::loadWorkbook(xlF)

  openxlsx::writeData(wb, sheet = names(wb)[1], startRow = 10, startCol = 1, x = dat)
  openxlsx::openXL(wb)
- Session info ------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2019-05-14                  

- Packages ----------------------------------------------------------------------------------------
 package     * version    date       lib source                  
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.5.3)          
 base64enc     0.1-3      2015-07-28 [1] CRAN (R 3.5.0)          
 bit           1.1-14     2018-05-29 [1] CRAN (R 3.5.0)          
 bit64         0.9-7      2017-05-08 [1] CRAN (R 3.5.0)          
 cli           1.1.0      2019-03-19 [1] CRAN (R 3.5.3)          
 crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)          
 data.table    1.12.2     2019-04-07 [1] CRAN (R 3.5.3)          
 DBI           1.0.0      2018-05-02 [1] CRAN (R 3.5.1)          
 dbplyr        1.3.0      2019-01-09 [1] CRAN (R 3.5.3)          
 digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.3)          
 dplyr         0.8.0.1    2019-02-15 [1] CRAN (R 3.5.2)          
 evaluate      0.13       2019-02-12 [1] CRAN (R 3.5.3)          
 fansi         0.4.0      2018-10-05 [1] CRAN (R 3.5.2)          
 flextable     0.5.2      2019-04-02 [1] CRAN (R 3.5.3)          
 forcats       0.4.0      2019-02-17 [1] CRAN (R 3.5.3)          
 gdtools       0.1.8      2019-04-02 [1] CRAN (R 3.5.3)          
 gfuns       * 0.0.0.9000 2019-03-12 [1] local (HJAllen/gfuns@NA)
 glue          1.3.1      2019-03-12 [1] CRAN (R 3.5.3)          
 hms           0.4.2      2018-03-10 [1] CRAN (R 3.5.1)          
 htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.1)          
 knitr         1.22       2019-03-08 [1] CRAN (R 3.5.3)          
 LHfuns      * 0.0.0.9000 2018-12-18 [1] local (@0.0.0.9)        
 lubridate     1.7.4      2018-04-11 [1] CRAN (R 3.5.1)          
 magrittr    * 1.5        2014-11-22 [1] CRAN (R 3.5.1)          
 officer       0.3.3      2019-03-01 [1] CRAN (R 3.5.2)          
 openxlsx    * 4.1.0      2018-05-26 [1] CRAN (R 3.5.3)          
 pander        0.6.3      2018-11-06 [1] CRAN (R 3.5.1)          
 pillar        1.3.1      2018-12-15 [1] CRAN (R 3.5.2)          
 pkgconfig     2.0.2      2018-08-16 [1] CRAN (R 3.5.1)          
 purrr         0.3.2      2019-03-15 [1] CRAN (R 3.5.3)          
 R6            2.4.0      2019-02-14 [1] CRAN (R 3.5.3)          
 Rcpp          1.0.1      2019-03-17 [1] CRAN (R 3.5.3)          
 readr         1.3.1      2018-12-21 [1] CRAN (R 3.5.3)          
 rlang         0.3.4      2019-04-07 [1] CRAN (R 3.5.3)          
 RMariaDB      1.0.6      2018-05-06 [1] CRAN (R 3.5.1)          
 rmarkdown     1.12       2019-03-14 [1] CRAN (R 3.5.3)          
 rstudioapi    0.10       2019-03-19 [1] CRAN (R 3.5.3)          
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.3)          
 stringi       1.4.3      2019-03-12 [1] CRAN (R 3.5.3)          
 stringr       1.4.0      2019-02-10 [1] CRAN (R 3.5.3)          
 tibble        2.1.1      2019-03-16 [1] CRAN (R 3.5.3)          
 tidyr         0.8.3      2019-03-01 [1] CRAN (R 3.5.3)          
 tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.1)          
 utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.1)          
 uuid          0.1-2      2015-07-28 [1] CRAN (R 3.5.0)          
 withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)          
 xfun          0.6        2019-04-02 [1] CRAN (R 3.5.3)          
 xml2          1.2.0      2018-01-24 [1] CRAN (R 3.5.1)          
 zip           2.0.1      2019-03-11 [1] CRAN (R 3.5.3)  
rkrug commented 5 years ago

OK - I had the error on two Macs, but did not have it in Linux.

Any help appreciated, what is going on - this affects a nearly finished package!

HJAllen commented 5 years ago

Comparing the styles.xml between the original .xlsx and the openxlsx modified version, there are at least some differences in stylesheets. It appears, at least in my instance, stylesheet references are being dropped.

Original File

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:x16r2="http://schemas.microsoft.com/office/spreadsheetml/2015/02/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" mc:Ignorable="x14ac x16r2 xr">
  <numFmts count="1">
    <numFmt numFmtId="164" formatCode="mm/dd/yy;@"/>
  </numFmts>

Openxlsx modified file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" mc:Ignorable="x14ac">
  <numFmts count="1">
    <numFmt numFmtId="164" formatCode="mm/dd/yy;@"/>
  </numFmts>

See https://github.com/awalker89/openxlsx/issues/464#issuecomment-492719903 It appears the following tablestyle is added by openxlsx and is the possible offender

    <tableStyle name="MySqlDefault" pivot="0" table="0" count="2" xr9:uid="{E4D6D12A-12B2-45C8-91AA-1B61EA448CBE}">
      <tableStyleElement type="wholeTable" dxfId="1"/>
      <tableStyleElement type="headerRow" dxfId="0"/>
    </tableStyle>
  </tableStyles>
HJAllen commented 5 years ago

when I remove

<tableStyles count="1" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
    <tableStyle name="MySqlDefault" pivot="0" table="0" count="2" xr9:uid="{8049963C-4A48-4192-A808-6AD7E4D6BD5E}">
      <tableStyleElement type="wholeTable" dxfId="1"/>
      <tableStyleElement type="headerRow" dxfId="0"/>
    </tableStyle>
  </tableStyles>

and replace with

<tableStyles count="1" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>

zip it up and rename .xlsx, it opens no problem.

rkrug commented 5 years ago

Thanks for looking into that. The strange thing is, that it worked under Linux.

I can try out a fix as soon as you have one available on github.

HJAllen commented 5 years ago

It appears openxlsx is not adding the offending tablestyle. I have MySQL for excel installed and am suspicious it is adding the style. Using repaired file as above:

  1. open in excel
  2. modify a cell
  3. save The tablestyle reappears with subsequent corruption. Solution for me: Don't save in excel again.
rkrug commented 5 years ago

The problem are validations (at least in the new version of Excel - 16.25). As soon as I enter one validation rule, it does not work anymore.

Jannickz commented 5 years ago

The problem are validations (at least in the new version of Excel - 16.25). As soon as I enter one validation rule, it does not work anymore.

I have the same error with validations in windows Excel 365, this error only pops out after I upgraded to office 365.

franciscoyira commented 4 years ago

I'm having the same problem, also in workbooks with validation rules :(

sambafall commented 3 years ago

Same here , and because I didn't initiate the excel file, I used the link below to find and delete data validation rules, after that everythings worked fine

https://www.ablebits.com/office-addins-blog/2017/08/16/data-validation-excel/

Coledavis00 commented 3 years ago

Hey all, a little late here, but I was experiencing the same problem until I removed special characters from my tab names.

tomtom776 commented 3 years ago

Hey all, a little late here, but I was experiencing the same problem until I removed special characters from my tab names.

This worked for me too! Just renamed the sheet that had an "&" symbol in it and everything worked perfectly! Thanks

melville1808 commented 3 years ago

Found another thing that triggers this error. Text boxes are fine, but text boxes within a chart object are not.

crisrengifo commented 3 years ago

https://github.com/awalker89/openxlsx/issues/464#issuecomment-492719903 pointed me in the right direction in my case. I had a custom table style save to my template workbook. Deleting the custom table style in the template fixed this for me.