dhoerl / DHlibxls

Framework to read Excel xls spreadsheets
271 stars 86 forks source link

Fails if table has an empty cell #18

Open DLGill opened 9 years ago

DLGill commented 9 years ago

If an Excel file has a table with any empty cells, DHXlsReader will stop reading and return end of file (it won't read any subsequent rows / columns in the table or anything else in the file). I can submit an example Excel file if someone can explain how to attach a file to an issue submission (it seems to only allow an image). I have attached an image of a file that fails. It will stop reading after encountering in blank cell in row 3 column 1. examplefail

dhoerl commented 9 years ago

The way to "send" a file is put it on Dropbox or some other public forum, or if you do not want to do that you can email it to me directly at dhoerl at mac dot com

dhoerl commented 9 years ago

The problem is password protection. Data is all unavailable in this case, nothing the library can do. These are the relevant records:

libxls : BOF ID: 0012h PROTECT (Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0012 size=2

libxls : BOF ID: 0013h PASSWORD (Protection Password) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0013 size=2

libxls : BOF ID: 01AFh PROT4REV (Shared Workbook Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x01AF size=2

libxls : BOF ID: 01BCh PROT4REVPASS (Shared Workbook Protection Password) Size: 2 Not Processed in parseWoorkBook(): BOF=0x01BC size=2

DLGill commented 9 years ago

My file doesn't appear to be password protected. Can you open the file from Excel? I can without any password. I don't know how it would have been protected. The file was generated by scanning it.

Kind Regards,

David Gill President

IntegrityWare, Inc.(http://www.IntegrityWare.com) nPower Software (http://www.nPowerSoftware.com/) DGILL@nPowerSoftware.com mailto:DGILL@nPowerSoftware.com Phone: 858-592-8866 Fax: 858-592-8844

On 1/13/2015 3:09 PM, David Hoerl wrote:

The problem is password protection. Data is all unavailable in this case, nothing the library can do. These are the relevant records:

libxls : BOF ID: 0012h PROTECT (Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0012 size=2

libxls : BOF ID: 0013h PASSWORD (Protection Password) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0013 size=2

libxls : BOF ID: 01AFh PROT4REV (Shared Workbook Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x01AF size=2

libxls : BOF ID: 01BCh PROT4REVPASS (Shared Workbook Protection Password) Size: 2 Not Processed in parseWoorkBook(): BOF=0x01BC size=2

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/18#issuecomment-69839707.

DLGill commented 9 years ago

Hi David,

Also, how am I able to read the first 2 rows (but not the 3rd) if it is protected? Can a portion of an Excel file be protected? I can edit the file manually without entering any password.

Could it be flagging the table as protected accidentally?

nPowerLogo.gif

IwLogo.gif Kind Regards,

David Gill President

IntegrityWare, Inc.(http://www.IntegrityWare.com) nPower Software (http://www.nPowerSoftware.com/) DGILL@nPowerSoftware.com mailto:DGILL@nPowerSoftware.com Phone: 858-592-8866 Fax: 858-592-8844

On 1/13/2015 3:09 PM, David Hoerl wrote:

The problem is password protection. Data is all unavailable in this case, nothing the library can do. These are the relevant records:

libxls : BOF ID: 0012h PROTECT (Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0012 size=2

libxls : BOF ID: 0013h PASSWORD (Protection Password) Size: 2

Not Processed in parseWoorkBook(): BOF=0x0013 size=2

libxls : BOF ID: 01AFh PROT4REV (Shared Workbook Protection Flag) Size: 2

Not Processed in parseWoorkBook(): BOF=0x01AF size=2

libxls : BOF ID: 01BCh PROT4REVPASS (Shared Workbook Protection Password) Size: 2 Not Processed in parseWoorkBook(): BOF=0x01BC size=2

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/18#issuecomment-69839707.

dhoerl commented 9 years ago

Really, the libxls library should just abort when it sees these protection flags. I don't know the ins and outs of how Excel implements these, but this issue of passwords and protections blocking the library from reading a file are longstanding. Its sort of funny, but I don't have access to a copy of Excel - mine all ran on PowerPC Macs. When I tried to open your file with Numbers (the apple spreadsheet), it failed. Normally it opens xls files without problem.

DLGill commented 9 years ago

Hi David,

It's strange because there don't seem to be any protections set on the file (see attached dialog). I can't figure out how to turn off the protections, and DHxlsReader seems to be able to read everything until after it finds the empty cell in the table (it reads the first cell of the table as empty). After that, it returns "(null)" for each cell. I don't have the Numbers app, but Excel opens the file fine (that's where I got the screen capture).

nPowerLogo.gif

IwLogo.gif Kind Regards,

David Gill President

IntegrityWare, Inc.(http://www.IntegrityWare.com) nPower Software (http://www.nPowerSoftware.com/) DGILL@nPowerSoftware.com mailto:DGILL@nPowerSoftware.com Phone: 858-592-8866 Fax: 858-592-8844

On 1/14/2015 6:37 AM, David Hoerl wrote:

Really, the libxls library should just abort when it sees these protection flags. I don't know the ins and outs of how Excel implements these, but this issue of passwords and protections blocking the library from reading a file are longstanding. Its sort of funny, but I don't have access to a copy of Excel - mine all ran on PowerPC Macs. When I tried to open your file with Numbers (the apple spreadsheet), it failed. Normally it opens xls files without problem.

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/18#issuecomment-69924422.

DLGill commented 9 years ago

Hi David,

So I tried something interesting. I simply put the word "junk" into that first empty cell of the table (see attached). Not only did it get past that cell, but DHxlsReader is now able to read the entire rest of the file. So I believe that the problem has something specific to do with an empty cell in the first cell of a table. The other empty cells in the table don't cause DHxlsReader any problems.

Could you try with the attached file? Can you open this one?

nPowerLogo.gif

IwLogo.gif Kind Regards,

David Gill President

IntegrityWare, Inc.(http://www.IntegrityWare.com) nPower Software (http://www.nPowerSoftware.com/) DGILL@nPowerSoftware.com mailto:DGILL@nPowerSoftware.com Phone: 858-592-8866 Fax: 858-592-8844

On 1/14/2015 6:37 AM, David Hoerl wrote:

Really, the libxls library should just abort when it sees these protection flags. I don't know the ins and outs of how Excel implements these, but this issue of passwords and protections blocking the library from reading a file are longstanding. Its sort of funny, but I don't have access to a copy of Excel - mine all ran on PowerPC Macs. When I tried to open your file with Numbers (the apple spreadsheet), it failed. Normally it opens xls files without problem.

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/18#issuecomment-69924422.

dhoerl commented 9 years ago

There was no image attached to your post. What cell did you add junk into - row 3 column 1? The root issue here is libels's ability to decode the spreadsheet, not the wrapper ObjectiveC framework. I am testing using that and its where the strings seem to disappear. It must have something to do with the shared string table... The library has been shown to work with lots of empty cells - you can do a small test yourself of adding a few strings strewn in random cells.

DLGill commented 9 years ago

Hi David,

It wasn't an image, it was just a file. I am attaching it again, but I don't know if you can see it.

I put "junk" into cell 3 B (it is the first one of the "table"). Actually, it goes through B, C, D (but I just typed it into the empty cell.

nPowerLogo.gif

IwLogo.gif Kind Regards,

David Gill President

IntegrityWare, Inc.(http://www.IntegrityWare.com) nPower Software (http://www.nPowerSoftware.com/) DGILL@nPowerSoftware.com mailto:DGILL@nPowerSoftware.com Phone: 858-592-8866 Fax: 858-592-8844

On 1/15/2015 7:43 AM, David Hoerl wrote:

There was no image attached to your post. What cell did you add junk into?

— Reply to this email directly or view it on GitHub https://github.com/dhoerl/DHlibxls/issues/18#issuecomment-70104891.