awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 78 forks source link

Control Character breaks xlsx file #360

Open Delta-Kappa opened 6 years ago

Delta-Kappa commented 6 years ago

I have a dataframe containing a comment field. The field can contain arbitrary text.

Expected Behavior

Special characters can be written to xlsx and handled by openxlsx to either work correctly (write with openxlsx, open in Exel, re-read with openxlsx) or be masked/escaped to not break the xlsx file.

Actual Behavior

The presence of ASCII Control characters (e.g. \032) seem to break the exported xlsx file.

When opening with Excel 2016, the file is detected as corrupted and repaired with parts removed from /xl/shared-strings.xml. Afterwards, the column header as well as the text field are gone.

Also re-reading with openxlsx cannot recover the broken string and sometimes even crashes my R session.

Steps to Reproduce the Problem

test <- data.frame(x = c("Test", "\032 Broken"),
                   y = seq(2),
                   stringsAsFactors = FALSE)
print(test)
openxlsx::write.xlsx(test, "test.xlsx")

test.read <- openxlsx::read.xlsx("test.xlsx")
print(test.read)

test.xlsx

sessionInfo()

tylerlittlefield commented 2 years ago

+1 on this

Davidxing1007 commented 2 years ago

I am getting this error too, I love all the features that openxlsx offers, but this control character issue really threw me off. I tried using xlsx::write.xlsx2(), even though it doesn't break the Excel file, all the control characters were replaced by "?".

The only thing that is working for me at the moment is using write.csv(), it actually preserves the control characters.

reinwald-idg-rlp commented 8 months ago

+1