awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

Corrupt file #22

Open ARParis opened 10 years ago

ARParis commented 10 years ago

Thanks for the package.

On my system, I can read a file, write it. But it is corrupted for excel. Even a simple file such as: write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "rows")

I am sending by email the corresponding saved file

lbraglia commented 10 years ago

Are you using Excel <= 2003? See #8

ARParis commented 10 years ago

R version 3.1.1 Excel 2010 openxlsx version 2.0.15

awalker89 commented 10 years ago

Some zip applications are skipping the "Content_type.xml" file. Can you either send me a corrupt xlsx file at Alexander.Walker1989@gmail.com or jsut unzip the corrupt xlsx file and tell me if the [Content_type].xml is in there (see pic)

image

tgwhite commented 9 years ago

I've had similar problems with overwriting data in an existing workbook. I'm at work and can't share code freely at the moment but something like the code below results in a corrupted file. I'm not sure how much is due to an existing bug or how much is due to an R version mismatch/package update (or all of the above). I have a difficult time having our IT crew keeping R up to date on work servers -- this problem was encountered with R 3.1.3 and openxlsx 2.0.15 (by the way, how many updates to openxlsx have there already been? I feel like I just had this installed!)


data_frame_with_stuff = data.frame(x = 1:10, y = state.name[1:10])

## this worked fine ##

# workbook with some BASIC existing formatting, e.g. blue background
out_wb = loadWorkbook(path_to_wb) 
addDataTable(out_wb, sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none")
saveWorkbook(out_wb, new_wb_filename)

## then this fails ##
in_wb = loadWorkbook(new_wb_filename)
addDataTable(in_wb , sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none")
saveWorkbook(in_wb , final_filename)
awalker89 commented 9 years ago

Hi,

Thanks for you email. Can you please update to the most recent version of openxlsx from github with

devtools::install_github("awalker89/openxlsx")

If you are still having issues I'll be happy to help.

Regards, Alex

On 28 April 2015 at 06:28, Taylor White notifications@github.com wrote:

I've had similar problems with overwriting data in an existing workbook. I'm at work and can't share code freely at the moment but something like the code below results in a corrupted file. I'm not sure how much is due to an existing bug or how much is due to an R version mismatch/package update (or all of the above). I have a difficult time having our IT crew keeping R up to date on work servers -- this problem was encountered with R 3.1.3 and openxlsx 2.0.15 (by the way, how many updates to openxlsx have there already been? I feel like I just had this installed!)

data_frame_with_stuff = data.frame(x = 1:10, y = state.name[1:10])

this worked fine

workbook with some BASIC existing formatting, e.g. blue backgroundout_wb = loadWorkbook(path_to_wb)

addDataTable(out_wb, sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none") saveWorkbook(out_wb, new_wb_filename)

then this fails ##in_wb = loadWorkbook(new_wb_filename)

addDataTable(in_wb , sheet = "Sheet1", x = data_frame_with_stuff, tableStyle = "none") saveWorkbook(in_wb , final_filename)

— Reply to this email directly or view it on GitHub https://github.com/awalker89/openxlsx/issues/22#issuecomment-96808228.

vathymut commented 9 years ago

Hi Alex, I'm having the same issue. Installing the most recent version of openxlsx did not do it.

I am doing something very similar to @tgwhite.

# Load, add data and save worksheet
wb <- loadWorkbook( file =  xlsx_path )
addWorksheet( wb, sheetName = "a_name" )
writeData( wb, sheet = "a_name", x = a_data_frame, withFilter = TRUE, headerStyle = someStyle ) 
saveWorkbook( wb, file =  xlsx_path, overwrite = TRUE )

It exhibits this behavior on some excel workbooks but not others. When it does work -- aka the resulting .xlsx file is not corrupt -- it still warns that:

Warning messages:
1: In rm(dXML) : object 'dXML' not found
2: Overwriting existing cell data. 

Any workaround for this?

Best, V.

awalker89 commented 9 years ago

Yeah that's a bug. I've updated the dev version to remove that line. Please update with

devtools::install_github("awalker89/openxlsx")
merkliopas commented 9 years ago

hey, i'm still having the same problem with generating corrupt output file. any news on that front? just as @tgwhite noted, it happens when modifying existing worksheet with some formatting. thanks!

awalker89 commented 9 years ago

If possible can you please email the xlsx file? The bugs associated with loadWorkbook tend to be due to features openxlsx does not yet support (or only has basic support for). If you can email a file which is corrupted on load I can fix this.

merkliopas commented 9 years ago

thanks! i've emailed you those. let us know when you dissect what was causing the troubles

awalker89 commented 9 years ago

Thanks Sarunas. This particular bug should be fixed now.

merkliopas commented 9 years ago

it works! great job, thanks for your effort.

klittle314 commented 8 years ago

Wondering if you got any insight into the corrupt file issue flagged last year by ARParis. When using openxlsx to write files, the [Content_Types].xml file is not included thus Excel sees corrupted file.

I am using the dev version openxlsx 3.0.27, R version 3.2.2 on Windows 7 ($system: "x86_64, mingw32") and Excel 2010.

I have unloaded xlsx package which evidently can cause clash with openxlsx, restarted R, etc.

Thanks, example file .xlsx file attached that shows the problem.

My codeveloper can get openxlsx to behave, when I copy the [Content_Types].xml file from her output .xlsx and combine it with contents of my corrupted file, Excel is happy.

Test_of_TCP_Event_Follow-up_Template_November_19_2015_summaryresults_2015-12-12.xlsx

PS I have confirmed that Rtools\bin and Rtools\gcc-4.6.3 are in the PATH variable, using version Rtools33 for version 3.2.x and later.

awalker89 commented 8 years ago

Never solved this one - if you find anything please let me know.

klittle314 commented 8 years ago

OK, working with Office can be painful.

klittle314 commented 8 years ago

Resolution does not appear to be something internal to openxlsx package. Perhaps the best you can do is flag in documentation? As you indicate above, the problem appears to involve the zip.exe file that is supposed to run as part of RTools but doesn't.

To assure that openxlsx knows where to find the zip executable in Rtools we may have to force the association through a system function call: (1) verify that Sys.getenv("R_ZIPCMD","zip") returns "zip" (2) Then set explicitly the path to the zip.exe file in Rtools, e.g. Sys.setenv(R_ZIPCMD= "C:/Program Files/R/Rtools/bin/zip")

reference: http://stackoverflow.com/questions/27952451/error-zipping-up-workbook

tempfile commented 8 years ago

Hello everyone, I think I might have found a solution to the original problem (missing Content Types XML in written XLSX). I had the same issue and it may be due to an old version of ZIP!

After learning here that a ZIP application is crucial I went to see whether I have one on my system at all. Turns out, my zip.exe was an antique version 2.3 from 1999 from, surprisingly, a quite recent installation of UnxUtils. (The equivalent of "which zip" in Windows is "where zip", by the way)

I replaced that one with the zip.exe from Rtools (3.0 from 2008) and, bang, no problem at all! Because I had not set the PATH to my Rtools, the old zip was run by openxlsx.

For anyone who hasn't wrestled with pathes and Unix utilities in the past, I assume this issue would be completely unsolvable. Maybe contacting the folks who make UnxUtils, Cygwin and the like would be useful and notify them that their old versions are causing problems with other software.

debarros commented 8 years ago

If this issue is related to the version of zip.exe, is there a way to identify the version before using it? That way, a warning message could be generated if the version is too old.

hannofalkenberg commented 7 years ago

Hey,

I'm just encountering a similar corruption problem. Just opening an excel file with loadWorkbook and saving it again under a different name corrupts the file in a way that I cannot open it with excel. I checked the issues mentioned above, but I do have the [Content Types].xml and such.

However, I realised that some of the files in the unziped excel file are named differently. In the original file, chartsheets are named from 1 to 31, whereas in the newly created file they run from 32 to 62.

This problem only arises with an old excel file that I have, where the individual sheets have been moved and renamed and such. With a new excel file all files within the excel file are named equally in the original file and the file opened and saved with openxlsx. I can't enclose the file unfortunately, and tried to reproduce the problem with a new file, but the damn thing is working fine unless I use my old file.

Any hints? Many thanks!

awalker89 commented 7 years ago

@hannofalkenberg it will be a bug in openxlsx. If you can supply an xlsx file which is corrupted after loading with openxlsx I will have a look.

igor-misechko commented 6 years ago

Hi, I have also got corrupt excel file after saving. Excel try to restore it and said:

Replaced component: xl/worksheets/sheet1.xml with XML error. The name of the end tag of the element must match the element type in the start tag. Line 1, column 1499692.

AlexUsR commented 6 years ago

@igor-misechko I also had that same problem. It solved removing the active filters from the sheet.

linako111 commented 6 years ago

I have a similar issue. Here is the code: fileTemplate <- 'New01.xlsx' wbTemplate <- loadWorkbook(fileTemplate) addWorksheet(wbTemplate, "Sheet1") writeData(wbTemplate, "Sheet1", dataset) len <- NROW(dataset) dataValidation(wbTemplate, 2, col = 2, rows = 2:len, type = "list", value = "'Data Validation'!$A$2:$A$19") dataValidation(wbTemplate, 2, col = 3, rows = 2:len, type = "list", value = "'Data Validation'!$B$2:$B$501") dataValidation(wbTemplate, 2, col = 5, rows = 2:len, type = "list", value = "'Data Validation'!$C$2:$C$6") openXL(wbTemplate) if I just use one dataValidation it opens okey, if more than one it complains about the file being corrupt...

aviertio commented 5 years ago

@igor-misechko I also had that same problem. It solved removing the active filters from the sheet.

@AlexUsR Can you tell in detail how to fix this? I have the same problem as @igor-misechko

aviertio commented 5 years ago

@awalker89 I can also now confirm that corruption doesn't happen if I remove filtering from the file which is loaded. With filtering it saves as corrupt.

NaRuecker commented 5 years ago

I have the same problem. I updated Rtools, installed most recent version of openxlsx and I don't have a filter set. This is my code:

library(openxlsx) Sys.getenv("R_ZIPCMD", "zip")# This returns zip, I guess that is fine

read the summarydata

CountrySummary <- read.csv('CountrySummary.csv')

Countries <- unique(CountrySummary$Country)

write one xlsx file with all the Country data

wb <- createWorkbook("GC_CountrySummary") for (C in Countries){ df_c <- subset(CountrySummary, CountrySummary$Country==C) df_c <- df_c[order(df_c$Date),]

addWorksheet(wb,df_c, sheetName = C,gridLines = TRUE, tabColour = NULL, zoom = 100, header = NULL, footer = NULL, evenHeader = NULL, evenFooter = NULL, firstHeader = NULL, firstFooter = NULL, visible = TRUE) writeData(wb, sheet=C, df_c, colNames = TRUE) } saveWorkbook(wb, "GC_CountrySummary.xlsx", overwrite = TRUE)

AmyMikhail commented 5 years ago

I'm having the same issue, on Windows 10 with office 2016 and R 3.6.1, openxlsx 4.1.0.1.

I'm writing to a pre-formatted workbook - saveWorbook() works fine for the first sheet, but when I try to add another sheet to the same workbook (starting with loadWorkbook() again) it either fails (says a workbook of that name doesn't exist even though I can see it is there) or the workbook is corrupted. I can open the corrupted workbook and follow the MS Excel instructions to repair it, but then my whole system slows down and ultimately crashes.

Moreover, the corrupted file is 'locked for editing by another user' so I can't even delete it.

I don't know about the xml thing, but I did see a warning message while running my script saying 'zip is depracated' so it is possible that my version of zip is not compatible with openxlsx as for some other posters above.

The other thing that occurs to me is that there is no function in openxlsx to close the workbook (or unload it, or de-activate it, whichever term is most appropriate). Could this be why my computer thinks the file is still open?

I would try to attach an example but can't because my computer is freezing right now due to this issue. I was wondering if there was something about the way the workbook was formatted (not done by me, it is a template for uploading data to a website that has some conditional formatting and data checks built in) which is somehow incompatible with openxlsx - but then I can see from the above posts that others are having the same issue even with very simple data sets (and presumably blank unformatted workbooks).

Could adding a function such as closeWorkbook() help? I don't really understand how openxlsx works so no idea if this is relevant or not, but just a thought. I really hope others can adopt this package and inject new life into its maintenance because apart from the issue above it is a really great package (and much better than the Java dependent ones).

acvelozo commented 5 years ago

Hi @awalker89 ,

I have the same setup as @AmyMikhail and am running into the same problem of corrupting the Excel file when saving. This happens when the original file has filters. This is a sample file that gets corrupted in my PC just by loading and saving it and here is the code that I used:

pathExcel1 <- "C:/Users/Alexandre Velozo/Desktop/test file.xlsx"
pathExcel2 <- "C:/Users/Alexandre Velozo/Desktop/test file 2.xlsx"

wb <- loadWorkbook(pathExcel1)
saveWorkbook(wb,pathExcel2)

If I remove the filters, the file saves correctly.

Thank you in advance for any inputs. Best regards, Alexandre Velozo

t-morrison commented 5 years ago

I am working in a very similar setup as @AmyMikhail . For some reasons, certain .xlsx files I am receiving from others are getting corrupted when I load them with loadWorkbook() and then save that workbook object (slightly modified) with saveWorkbook().

When attempting to open the newly saved workbook, I would see this error: Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Undeclared prefix. Line 1, column 523098.

Interestingly, I discovered that if I opened the original .xlsx file on my machine, saved it as a new file (w/o changing anything), and repeated the above steps (loading in this new wb and saving it w/o any changes with openxlsx), the newly openxlsx-created file was not corrupted.

Using these two files, wb1 (that would result in a corrupted output), and wb2 (the locally saved version that would not result in a corrupted output), I set out to determine what about them was different, by exploring the workbook objects loaded from loadWorkbook().

Through some trial and error, I discovered that replacing the worksheet from wb1 with the equivalent in wb2 resolved the corruption issue (it was a single worksheet workbook). I drilled down further, and found that the issue was with the dataValidations field of the Worksheet within the workbook object.

wb1 <- loadWorkbook("bad_file.xlsx")
wb2 <- loadWorkbook("good_file.xlsx")
saveWorkbook(wb1, "wb1_out.xlsx") #this is corrupt
saveWorkbook(wb2, "wb2_out.xlsx") #this is OK

wb1$worksheets[[1]] <- wb2$worksheets[[1]]

saveWorkbook(wb1, "wb1_out.xslx") #this is OK

wb1 <- loadWorkbook("bad_file.xlsx") #to overwrite the fixed wb1

wb1$worksheets[[1]]$dataValidations <- wb2$worksheets[[1]]$dataValidations
saveWorkbook(wb1, "wb1_out.xslx") #this is OK also

Here is what the dataValidations fields look like (corrupt followed by OK). In the Excel file, it is a dropdown with two options ("USD million", "LCU million").

<dataValidation type=\"list\" allowBlank=\"1\" showErrorMessage=\"1\" sqref=\"D57:D59 D35:D44 D46:D52 D54:D55 D61:D68\" xr:uid=\"{00000000-0002-0000-0000-000000000000}\">
    <formula1>\"USD million,LCU million\"</formula1>
</dataValidation>

<dataValidation type=\"list\" allowBlank=\"1\" showErrorMessage=\"1\" sqref=\"D57:D59 D35:D44 D46:D52 D54:D55 D61:D68\">
    <formula1>\"USD million,LCU million\"</formula1>
</dataValidation>

I did not diagnose what/where the issue with the dataValidation field is/is coming from, but to solve it I am just overwriting the field from every new Excel file with the field from a working template. Hope it can help some here solve their problems.

jingchenr commented 3 years ago

I encountered a similar issue: reading an Excel template into R by loadworkbook() and simply save workbook again would result in a corrupted file. The repair result says "...Xml parsing error line 1, column 47451". I don't have any filters in the template, only four data validations with lists; but removing the data validation the problem still exists so I'm not sure it's the problem with the data validation.

I'm not sure how to check the Zip version.I'm using a Mac. I did notice the corrupted file has a smaller size than the originial template. > Sys.getenv("R_ZIPCMD", "zip") [1] "/usr/bin/zip"

I attach the template file and the corrupted output, as well as a screenshot of the Repaired Result. Grateful for any suggestion. R_Template_name.xlsx R_generated_corrupt_file.xlsx

Screen Shot 2020-10-28 at 3 15 05 PM

Below are the commands I used. wb1=loadWorkbook(template_name) saveWorkbook(wb1, output_file1, TRUE)

This is my R session info: `R version 4.0.1 (2020-06-06) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.12.8 zoo_1.8-8 lubridate_1.7.8 reshape2_1.4.4 dplyr_1.0.0 plyr_1.8.6 openxlsx_4.2.3

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 rstudioapi_0.11 magrittr_1.5 tidyselect_1.1.0 lattice_0.20-41 R6_2.4.1 rlang_0.4.6 stringr_1.4.0
[9] tools_4.0.1 grid_4.0.1 xfun_0.14 tinytex_0.23 ellipsis_0.3.1 tibble_3.0.1 lifecycle_0.2.0 crayon_1.3.4
[17] zip_2.0.4 purrr_0.3.4 vctrs_0.3.1 glue_1.4.1 stringi_1.4.6 compiler_4.0.1 pillar_1.4.4 generics_0.0.2
[25] pkgconfig_2.0.3 `

daomeidan1234 commented 2 years ago

I still have this issue today in 2022, my R is 4.0.3, and openxlsx is 4.2.5.

If I loadworkbook, and saveworkbook, the excel file with dropdown validations will become corrupted.

Strange enough, it looks like the first time load/save is ok, the second time always corrupt the files.

Cannot believe this issue is still not solved in 4.2.5

FelipePKest commented 1 year ago

Same issue here...

FelipePKest commented 1 year ago

I have some pivot tables on some sheets on my workbook, I believe that maybe something is happening because of it. When I am reading and saving data on other workbooks I'm having no issue.

FelipePKest commented 1 year ago

After removing the pivot table and other plots the problem is fixed. However when there are any plot in the entire worksheet, the commands corrupt the worksheet.

NieznanyImANinja commented 1 year ago

It seems that openxlsx creates unnecesssary xml.rels, which can be seen by opening xlsx using 7z. Deleting these xml.rels has allowed me to open and save the file one more time. https://imgur.com/a/dJZoS4k

itsmevictor commented 4 months ago

Still no solution/workaround I assume?