Open colearendt opened 4 years ago
@RE-Chief I am not super familiar with .xls
objects (as they are quite old), however, I believe read.xlsx
, write.xlsx
and friends are not going to be super helpful (as is indicated by their being named after the xlsx
format). However, I know that Apache POI does support the XLS format (which it calls HSSF for "Horrible Spreadsheet Format" 😂 ).
There may be other libraries to work with this file format too.
You can interact with the documents directly using something like this (note I need to create one from scratch since I do not have any handy):
library(xlsx)
new_wb <- createWorkbook(type="xls")
new_wb
#> [1] "Java-Object{org.apache.poi.hssf.usermodel.HSSFWorkbook@65e98b1c}"
s1 <- createSheet(new_wb)
addDataFrame(iris, s1)
new_wb_path <- tempfile(fileext = ".xls")
saveWorkbook(new_wb, new_wb_path)
read_wb <- loadWorkbook(new_wb_path)
all_sheets <- getSheets(read_wb)
head(xlsx::readColumns(all_sheets[[1]], 1, 5, 1))
#> X. Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 1 5.1 3.5 1.4 0.2
#> 2 2 4.9 3.0 1.4 0.2
#> 3 3 4.7 3.2 1.3 0.2
#> 4 4 4.6 3.1 1.5 0.2
#> 5 5 5.0 3.6 1.4 0.2
#> 6 6 5.4 3.9 1.7 0.4
Created on 2020-04-19 by the reprex package (v0.3.0)
Many thanks for your reply Cole. My xls files must be pre-97 as I ran the code you kindly provided and I was able to open the created xls file with either read.xlsx or read.xlsx2. I cannot open my xls files with either, yet they will open with excel so I know they are valid files. I can pay for some software to do this for me but the money being asked for as crazy dollars. Thanks again for your efforts, stay well.
Kind regards Charles
@RE-Chief Did you try using loadWorkbook()
as I showed above? Can you share the results of your output, the errors, etc.? My expectation is that loadWorkbook()
along with readRows()
, readColumns()
or readRange()
should work for .xls
files. read.xlsx
or read.xlsx2
will not work.
Also, you might try readxl
, which seems to support xls
as well: https://readxl.tidyverse.org/
Thanks Cole, I have tried readxl previously.
read_wb <- loadWorkbook("C:/Users/Aquatic/Desktop/EP Snowy Model/OutputMS1/1.xls") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
Kind regards Charles
Interesting!! Well that is a different one! Any chance these files are misnamed and their encoding is actually some other type of file?
I.e. you could try opening the file in a text editor like Notepad
or Wordpad
. If it is a CSV file (a text file with common separated values), opening it with xlsx
or readxl
in this manner will fail, but it will open just fine in Excel.
Hi Cole, I have attached an example file. It opens in notepad but as gobbledigook. It definitely opens in excel and I can save it as an xlsx file which will then open in R. But the whole purpose of this is to be able write a loop in R to open the xls files and save them as xlsx files.
Kind regards Charles
Hi Cole, I have attached an example file. It opens in notepad but as gobbledigook. It definitely opens in excel and I can save it as an xlsx file which will then open in R. But the whole purpose of this is to be able write a loop in R to open the xls files and save them as xlsx files.
Kind regards Charles
Hi Chief, I'm facing the same problem with .XLS files that I'm trying to process in R, I also tried readxl, XLConnect, xlsx and got no chance to open them in R. As you said, I can open them manually in excel and save them in xlsx format which will open in R but I have too many files that I need to process. any chances you solved this problem?
Thank you.
@AmmarAli92 this issue was not resolved through R. I did manage to find a solution using libreoffice. This problem has reared its head again for me and I am currently working on a better solution. It may be a couple of weeks before it is resolved but I will let you know if and when that happens.
Kind regards Charles
Finally got around to digging into this a smidge. It looks like this particular .xls
file example is "invalid" according to the (newer, at least) Apache POI library, so will not be parsable without some modifications.
I tested this by digging in a little bit lower level, and seeing that the binary data format is not what Apache POI is expecting. Docs here and here
library(xlsx)
library(rJava)
myf <- .jnew("java/io/File", normalizePath("~/Downloads/bad.xls"))
poif <- .jnew("org/apache/poi/poifs/filesystem/POIFSFileSystem", myf)
#> Error in .jnew("org/apache/poi/poifs/filesystem/POIFSFileSystem", myf): org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature; read 0x000A000200040009, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document
Created on 2022-04-29 by the reprex package (v0.3.0)
Unfortunately, it looks like the mechanism that is producing your .xls
files is either invalid or quite outdated (perhaps using OLE1
?)
If the .xls
file is produced by Excel, for instance, things seem to work just fine. (Looking at the binary diff, though - things do differ quite a bit)
library(xlsx)
loadWorkbook("~/Documents/bad2.xls")
#> [1] "Java-Object{org.apache.poi.hssf.usermodel.HSSFWorkbook@1951b871}"
Created on 2022-04-29 by the reprex package (v0.3.0)
Some possible solutions I unfortunately don't have the time to dig into directly:
Apache POI
library directly with rJava
- it's possible that support for this older file format existed and was dropped at some point. The mechanism to load and resave the files could be pretty straightforwardI hope that helps!! Sorry for the trouble!
Hi RE-Chief I'm working on xls file which is not produced by EXCEL, and also have the same problem, same error code, and try readxl Package which isn't working.
And I ressolve this problem, here is my analysis: Issue Analysis: when xls file is produced by other software which is not by EXCEL, it will occur message box and tell you file format and extension does not match, file maybe corrupted, and so on I think when R package,like readxl, xlsx , processing the xls file, it can't processing this message box. Solution: So I try another way First, you need install RDCOMClient, here is link https://stackoverflow.com/questions/73190036/how-to-convert-xls-to-xlsx-using-r and use function convert_XLS_File_To_XLSX as literally Attention: two parameters, the former one is old file,like "C:\oldfile.xls", the latter one like "C\newfile.xlsx" Second use read_xlsx or read.xlsx or readxl functions and it worked.
Thks Emmanuel Hamel
Many thanks Emmanuel, I will check it out.
Kind regards Charles
On Thu, Nov 16, 2023 at 2:13 PM jiangfenglyu @.***> wrote:
Hi RE-Chief https://github.com/RE-Chief I'm working on xls file which is not produced by EXCEL, and also have the same problem, same error code, and try readxl Package which isn't working.
And I ressolve this problem, here is my analysis: Issue Analysis: when xls file is produced by other software which is not by EXCEL, it will occur message box and tell you file format and extension does not match, file maybe corrupted, and so on I think when R package,like readxl, xlsx , processing the xls file, it can't processing this message box. Solution: So I try another way First, you need install RDCOMClient, here is link https://stackoverflow.com/questions/73190036/how-to-convert-xls-to-xlsx-using-r and use function convert_XLS_File_To_XLSX as literally Attention: two parameters, the former one is old file,like "C:\oldfile.xls", the latter one like "C\newfile.xlsx" Second use read_xlsx or read.xlsx or readxl functions and it worked.
Thks Emmanuel Hamel https://stackoverflow.com/users/9245623/emmanuel-hamel
— Reply to this email directly, view it on GitHub https://github.com/colearendt/xlsx/issues/160#issuecomment-1813724033, or unsubscribe https://github.com/notifications/unsubscribe-auth/APHQEQ4KBJBVPGFPKRSSIEDYEWAFNAVCNFSM4MMBVS52U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGM3TENBQGMZQ . You are receiving this because you were mentioned.Message ID: @.***>
@RE-Chief opening a separate thread to discuss since #159 is unrelated
Hi Cole,
I have older xls files that I am trying to read into R. I cannot read them into R using any other means so I am trying to use your read.xlsx2 to read them.
I get the following error which I do not understand are you able to help me out please?
Regards Charles
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream