dilshod / xlsx2csv

Convert xslx to csv, it is fast, and works for huge xlsx files
MIT License
1.68k stars 303 forks source link

Add flag to suppress invalid formulas #244

Open khsu2000 opened 2 years ago

khsu2000 commented 2 years ago

When excel files have invalid formulas/data, this can cause xlsx2csv to produce an error like the following below (traceback truncated to only include the xlsx2csv errors):

Traceback (most recent call last):
  File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 860, in handleCharData
    self.data = ("%f" % (float(self.data))).rstrip('0').rstrip('.')
ValueError: could not convert string to float: '#N/A'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
  File "/shared/ceph/homes/khsu/data_warehouse/tools/src/preproc/read.py", line 155, in read_excel
    xlsx2csv.Xlsx2csv(infile).convert(outfile, sheetid=sheet_id)
  File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 235, in convert
    self._convert(sheetid, outfile)
  File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 353, in _convert
    sheet.to_csv(writer)
  File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 782, in to_csv
    self.parser.ParseFile(self.filehandle)
  File "/home/ebyrne/rpmbuild/BUILD/Python-3.7.4/Modules/pyexpat.c", line 282, in CharacterData
  File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 874, in handleCharData
    raise XlsxValueError("Error: potential invalid date format.")
xlsx2csv.XlsxValueError: Error: potential invalid date format.

This occurs with invalid formulas like #N/A and #VALUE!. Because xlsx2csv doesn't know how to handle these, this breaks the conversion. This PR aims to fix these issues by removing the faulty data instead of trying to force a conversion onto it when an invalid formula is encountered.