PydPiper / pylightxl

A light weight, zero dependency, minimal functionality excel read/writer python library
https://pylightxl.readthedocs.io
MIT License
302 stars 47 forks source link

Read Failure When Worksheet Missing #82

Open jmcpheters opened 1 year ago

jmcpheters commented 1 year ago

Pylightxl Version: pylightxl-1.61 Python Version: Python 2.7.18

Summary of Bug/Feature: Read failure for missing worksheets.

We process a spreadsheet provided by a vendor. We read in the spreadsheet as follows:

xlsx = xl.readxl(file)

We always get the exception:

"There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"

I realized this is an issue with the spreadsheet, but the provider has not been responsive about fixing it. Moreover, I can't include the spreadsheet as an example because it has sensitive information. If I open the spreadsheet (i.e. via calc or excel) and save it (either edited or unedited), then the problem goes away. This introduces a manual step and is not compatible with the automated process to securely transfer the spreadsheet to our server, where our server subsequently process the contents without human intervention.

I googled and saw some other spreadsheet packages had the same problem, where it looks like there they may have a workaround. Maybe pyllightxl already has a work around I was not able to find? If so, please advise. I did try adding ws='sheetname' where sheetname is known to exist, but the same exception is raised.

Traceback: File "/home/tasks/bin/CheckSpreadsheets.py", line 397, in main xlsx = xl.readxl(file, ws=uicaudit_sheets) File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 164, in readxl data = readxl_scrape(fn, fn_ws, sharedString, styles, comments) File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 489, in readxl_scrape with f_zip.open('xl/' + fn_ws, 'r') as file: File "/usr/lib64/python2.7/zipfile.py", line 984, in open zinfo = self.getinfo(name) File "/usr/lib64/python2.7/zipfile.py", line 932, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"

Suggestion for fix: Ignore the missing worksheet and continue reading in the rest of the spreadsheet.

PydPiper commented 1 year ago

Hi @jmcpheters thanks for posting this. I am held up by school work at the moment, but after this weekend I should be able to take a look and will get back to you with a fix!