awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

write.xlsx output has "content is unreadable" Excel error #185

Open alanault opened 8 years ago

alanault commented 8 years ago

Hi,

I've found that we often get an Excel error saying "some content is unreadable" when writing an XLSX. This is typically when we've posts from Twitter or Facebook that have foreign language characters.

Writing with the xlsx package works, so suspect it might be some sort of encoding error? If we chose "repair file", then the resulting XLSX has lots of missing data, and occasionally some columns with numbers in it, but the file is unusable.

Let me know if I can send you a XLSX example

Many thanks for the package, losing Java dependency for intput/output in Excel is super handy!

Best Alan

alanault commented 8 years ago

I still haven't found a way around this - if anybody has any suggestions, would be grateful!

kerry-ja commented 7 years ago

I have noticed this issue as well. I have found it is possible to identify the particular parts of the dataframe that are causing the issue by saving subsets of the data to narrow down where the issue is from, and then looking for unusual characters in the subset of data. As a minimally reproducable example, library(openxlsx) write.xlsx("\003","ExampleCorrupt.xlsx") produces an Excel file that when tried to open in Excel

We found a problem with some content in 'ExampleCorrupt.xlsx'. Do you want us to try to recover as much as we can?

As a couple of suggestions of workarounds:

  1. Replace the character or characters that cause an issue with something else.
  2. Save the file as a .csv file instead of using the openxlsx package.

Ideally the package could be updated to be able to handle unusual characters.

nicholaelaw commented 7 years ago

I'm seeing this everywhere too. For me, write.xlsx is fine. But the output of createWorkbook > addWorksheet > writeDataTable > saveWorkbook is corrupt. It looks like the character encoding is mismatched, all problematic files are located at /xl/tables/, which I think is the metadata for every table in the workbook.

I'm using Chinese sheetname and tablename throughout the workbook, it appears in the XML file, table name and table displayName are encoded in GBK (native.enc for Chinese on Windows), and the rest of the information are encoded in UTF8:

Opened with GBK/GB18030: image

Opened with UTF8: image

Hopefully this can be fixed.

jonathanztang commented 6 years ago

I had a similar issue using saveWorkbook with mixed foreign language text (Italian and Spanish, especially). Excel was alerting me to unreadable content and couldn't recover anything in the column when it tried. I took a pretty blunt approach to it and converted Latin to ASCII before removing escape characters (#308) and backslashes. I don't think this would work well with non-Latin characters though, as I expect it would break anything not converted to ASCII (not sure how stringi deals with those).

Example:

library(stringi)
library(magrittr) # For clearer code

badwords <- "utilizó" # (\u00f3, might've been the issue)
goodwords <- stri_trans_general(badwords, "Latin-ASCII") %>%
    stri_escape_unicode %>%
    gsub('(\\\\r)|(\\\\n)|(\\\\a)|(\\\\f)|(\\\\v)|(\\\\b)|(\\\\s)|(\\\\)', '', .)
ben519 commented 5 years ago

I recently discovered that this error occurs when loading a workbook that has auto filters and calling saveWorkbook() on that wb instance.

sertsedat commented 5 years ago

+1

wong-ziyi commented 5 years ago

+1

abbeythorpe commented 4 years ago

Having the same issue, with added weirdness that it works fine on my Mac but I get only numbers in output on my colleague's Windows machine.

Update: Worked around it by changing any NAs in headers to "".