Summary of Bug/Feature: Read failure for missing worksheets.
We process a spreadsheet provided by a vendor. We read in the spreadsheet as follows:
xlsx = xl.readxl(file)
We always get the exception:
"There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"
I realized this is an issue with the spreadsheet, but the provider has not been responsive about fixing it. Moreover, I can't include the spreadsheet as an example because it has sensitive information. If I open the spreadsheet (i.e. via calc or excel) and save it (either edited or unedited), then the problem goes away. This introduces a manual step and is not compatible with the automated process to securely transfer the spreadsheet to our server, where our server subsequently process the contents without human intervention.
I googled and saw some other spreadsheet packages had the same problem, where it looks like there they may have a workaround. Maybe pyllightxl already has a work around I was not able to find? If so, please advise. I did try adding ws='sheetname' where sheetname is known to exist, but the same exception is raised.
Traceback:
File "/home/tasks/bin/CheckSpreadsheets.py", line 397, in main
xlsx = xl.readxl(file, ws=uicaudit_sheets)
File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 164, in readxl
data = readxl_scrape(fn, fn_ws, sharedString, styles, comments)
File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 489, in readxl_scrape
with f_zip.open('xl/' + fn_ws, 'r') as file:
File "/usr/lib64/python2.7/zipfile.py", line 984, in open
zinfo = self.getinfo(name)
File "/usr/lib64/python2.7/zipfile.py", line 932, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"
Suggestion for fix:
Ignore the missing worksheet and continue reading in the rest of the spreadsheet.
Hi @jmcpheters thanks for posting this. I am held up by school work at the moment, but after this weekend I should be able to take a look and will get back to you with a fix!
Pylightxl Version: pylightxl-1.61 Python Version: Python 2.7.18
Summary of Bug/Feature: Read failure for missing worksheets.
We process a spreadsheet provided by a vendor. We read in the spreadsheet as follows:
We always get the exception:
"There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"
I realized this is an issue with the spreadsheet, but the provider has not been responsive about fixing it. Moreover, I can't include the spreadsheet as an example because it has sensitive information. If I open the spreadsheet (i.e. via calc or excel) and save it (either edited or unedited), then the problem goes away. This introduces a manual step and is not compatible with the automated process to securely transfer the spreadsheet to our server, where our server subsequently process the contents without human intervention.
I googled and saw some other spreadsheet packages had the same problem, where it looks like there they may have a workaround. Maybe pyllightxl already has a work around I was not able to find? If so, please advise. I did try adding ws='sheetname' where sheetname is known to exist, but the same exception is raised.
Traceback: File "/home/tasks/bin/CheckSpreadsheets.py", line 397, in main xlsx = xl.readxl(file, ws=uicaudit_sheets) File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 164, in readxl data = readxl_scrape(fn, fn_ws, sharedString, styles, comments) File "/usr/local/lib/python2.7/site-packages/pylightxl/pylightxl.py", line 489, in readxl_scrape with f_zip.open('xl/' + fn_ws, 'r') as file: File "/usr/lib64/python2.7/zipfile.py", line 984, in open zinfo = self.getinfo(name) File "/usr/lib64/python2.7/zipfile.py", line 932, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'xl/../xl/worksheets/sheet3.xml' in the archive"
Suggestion for fix: Ignore the missing worksheet and continue reading in the rest of the spreadsheet.