dilshod / xlsx2csv

Convert xslx to csv, it is fast, and works for huge xlsx files
MIT License
1.66k stars 302 forks source link

Parsing Numberformat when chk_exists() returns None #207

Open carvitup opened 3 years ago

carvitup commented 3 years ago

Background: the xlsx file in question is exported from a system and likely has some ugly non-standard formats on cells. If I open the file and save it again then xlsx2csv works fine. However, this isn't the most automated solution. The file in question is throwing the following error:

Traceback (most recent call last): File "xlsx2csv.py", line 1171, in File "xlsx2csv.py", line 201, in init File "xlsx2csv.py", line 361, in _parse File "xlsx2csv.py", line 531, in parse ValueError: invalid literal for int() with base 10: 'true'

In this particular case int(cellXfs._attrs['numFmtId'].value) is 8 which is not in the STANDARD_FORMATS and then chk_exists returns None which then is to be handled by:

numFmtId = int(cellXfs._attrs['applyNumberFormat'].value)

However, cellXfs._attrs['applyNumberFormat'].value returns a literal 'true' or 'false' which when calling int() throws an error. I believe this is solved by the below code:

numFmtId = int(cellXfs._attrs['applyNumberFormat'].value == 'true')

I will submit a pull request for this update. I don't think it impacts anything else and was the original intent of handling this rare occasion.

Thanks!