dilshod / xlsx2csv

Convert xslx to csv, it is fast, and works for huge xlsx files
MIT License
1.68k stars 302 forks source link

handleStartElement matching fails and leads to AttributeError #65

Closed fnl closed 9 years ago

fnl commented 9 years ago

Getting "null-pointer exceptions" from xlsx2csv when parsing some files:

Traceback (most recent call last):
  File "/usr/local/bin/xlsx2csv", line 847, in <module>
    xlsx2csv.convert(outfile, sheetid)
  File "/usr/local/bin/xlsx2csv", line 178, in convert
    self._convert(sheetid, outfile)
  File "/usr/local/bin/xlsx2csv", line 247, in _convert
    sheet.to_csv(writer)
  File "/usr/local/bin/xlsx2csv", line 558, in to_csv
    self.parser.ParseFile(self.filehandle)
  File "/usr/local/bin/xlsx2csv", line 660, in handleStartElement
    startCol = start.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

An example where this problem occurs is here.

According to pip, my currently installed version is the latest, but xlsx2csv -v only outputs the script's name (xslx2csv - want me to open a separate ticket for that?). I see I do have the --merge-cells options available from the script.

fnl commented 9 years ago

Seems a possible fix is to replace the RE ^([A-Z]+)(\d+)$ in line 658 (and that is repeated in the following line) with ^([A-Z]*)(\d+)$ (note the Kleene star). Then the extraction of this file works fine, but I have no idea if this breaks stuff elsehwere....

dilshod commented 9 years ago

Can you try latest version from github?

fnl commented 9 years ago

:+1: works! (sorry for being so late, having too much work coming at me...)