PydPiper / pylightxl

A light weight, zero dependency, minimal functionality excel read/writer python library
https://pylightxl.readthedocs.io
MIT License
303 stars 47 forks source link

Cannot open excel xlsx file #44

Closed redsigma closed 3 years ago

redsigma commented 3 years ago

Pylightxl Version: 1.55 , 1.54 Python Version: 3.7

Summary of Bug/Feature: Cannot open xlsx excel file. Using the following code:

  file, _ = urllib.request.urlretrieve("file.xlsx")
  path = pathlib.Path.cwd() / file
  db = pylightxl.readxl(path)

Traceback:

Traceback (most recent call last):
  File "open_excel.py", line 310, in <module>
    main()
  File "open_excel.py", line 303, in main
    read_from_the_excel_file()
  File "open_excel.py", line 245, in read_from_the_excel_file
    pylightxl.readxl(path)
  File "/opt/python/3.7/lib/python3.7/site-packages/pylightxl/pylightxl.py", line 118, in readxl
    wb_rels = readxl_get_workbook(fn)
  File "/opt/python/3.7/lib/python3.7/site-packages/pylightxl/pylightxl.py", line 227, in readxl_get_workbook
    for tag_sheet in root.findall('./default:sheets/default:sheet', ns):
  File "/opt/python/3.7/lib/python3.7/xml/etree/ElementPath.py", line 313, in findall
    return list(iterfind(elem, path, namespaces))
  File "/opt/python/3.7/lib/python3.7/xml/etree/ElementPath.py", line 292, in iterfind
    token = next()
  File "/opt/python/3.7/lib/python3.7/xml/etree/ElementPath.py", line 83, in xpath_tokenizer
    raise SyntaxError("prefix %r not found in prefix map" % prefix) from None
SyntaxError: prefix 'default' not found in prefix map
redsigma commented 3 years ago

Could be similar to #23

PydPiper commented 3 years ago

Hey @redsigma thanks for posting this. I will take a look at it today and get back to you. I suspect you are correct, it must be another program’s generated excel file that does not have all the proper namespace that excel writes by default. If thats the case for this one as well, I’ll go through and add a try except to all namespace parsing and see if that helps

PydPiper commented 3 years ago

It looks like the spreadsheet format is different than what current excel versions produces and openpyxl as well. Unable to tell if this spreadsheet was produced by a tool or an older version of excel but the issue appears to be an extra tag on the xml: <x:sheets> versus openpyxl and current excel version write <sheets>. I will take a look at adding support for this. Thanks again for posting this

PydPiper commented 3 years ago

Hi @redsigma i pushed the update to the master branch that is able to parse the tool generated spreadsheet you linked. I am working on a few other features so this update will get rolled it with those features in the coming week or so. Until then please use the master branch. Thanks for posting this and let me know if you have any more questions/troubles. As always thanks for considering using pylightxl for your project! :)