dilshod / xlsx2csv

Convert xslx to csv, it is fast, and works for huge xlsx files
MIT License
1.68k stars 303 forks source link

Fix bug when missing workbook relationship #277

Closed tmiller closed 8 months ago

tmiller commented 8 months ago

Fix bug when missing workbook relationship

Add a check to see if the list is empty before trying to access it's contents. If an excel file has an overridden relationship with no word "book" in the name it will attempt to grab the first item of an empty list when looking up workbook relationships.

IndexError: list index out of range

There could be a better fix to this issue I'm not well enough versed in the xslx specification. The following xlsx file caused the issue.

$ unzip -l some_file.xlsx
Archive:  some_file.xlsx
  Length      Date    Time    Name
---------  ---------- -----   ----
      142  02-06-2024 13:28   xl/worksheets/_rels/sheet1.xml.rels
 65968555  02-06-2024 13:28   xl/worksheets/sheet1.xml
  2078037  02-06-2024 13:28   xl/sharedStrings.xml
     9867  02-06-2024 13:28   xl/styles.xml
      566  02-06-2024 13:28   xl/_rels/workbook.xml.rels
      388  02-06-2024 13:28   xl/workbook.xml
      297  02-06-2024 13:28   _rels/.rels
     1122  02-06-2024 13:28   [Content_Types].xml
---------                     -------
 68058974                     8 files

In [Content_types].xml it is overriding the relationships to point at _rels/.rels rather than xl/_rels/workbook.xml.rels. This causes the workbook_relationships list to be empty causes the error mentioned above. One can see that it does indeed have a workbook relationship, however it is being overridden.

[Contenet_types].xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
  <Default Extension="png" ContentType="image/png"/>
  <Default Extension="jpeg" ContentType="image/jpeg"/>
  <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Default Extension="xml" ContentType="application/xml"/>
  <Default Extension="vml" ContentType="application/vnd.openxmlformats-officedocument.vmlDrawing"/>
  <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
  <Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
  <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
  <Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
  <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
</Types>

xl/_rels/workbook.xml.rels:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
  <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
</Relationships>
tanji commented 8 months ago

@dilshod could you please tag this fix? Thank you kindly