When excel files have invalid formulas/data, this can cause xlsx2csv to produce an error like the following below (traceback truncated to only include the xlsx2csv errors):
Traceback (most recent call last):
File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 860, in handleCharData
self.data = ("%f" % (float(self.data))).rstrip('0').rstrip('.')
ValueError: could not convert string to float: '#N/A'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
File "/shared/ceph/homes/khsu/data_warehouse/tools/src/preproc/read.py", line 155, in read_excel
xlsx2csv.Xlsx2csv(infile).convert(outfile, sheetid=sheet_id)
File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 235, in convert
self._convert(sheetid, outfile)
File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 353, in _convert
sheet.to_csv(writer)
File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 782, in to_csv
self.parser.ParseFile(self.filehandle)
File "/home/ebyrne/rpmbuild/BUILD/Python-3.7.4/Modules/pyexpat.c", line 282, in CharacterData
File "/shared/ceph/homes/khsu/data_warehouse/integration/penv/lib/python3.7/site-packages/xlsx2csv.py", line 874, in handleCharData
raise XlsxValueError("Error: potential invalid date format.")
xlsx2csv.XlsxValueError: Error: potential invalid date format.
This occurs with invalid formulas like #N/A and #VALUE!. Because xlsx2csv doesn't know how to handle these, this breaks the conversion. This PR aims to fix these issues by removing the faulty data instead of trying to force a conversion onto it when an invalid formula is encountered.
When excel files have invalid formulas/data, this can cause
xlsx2csv
to produce an error like the following below (traceback truncated to only include thexlsx2csv
errors):This occurs with invalid formulas like
#N/A
and#VALUE!
. Becausexlsx2csv
doesn't know how to handle these, this breaks the conversion. This PR aims to fix these issues by removing the faulty data instead of trying to force a conversion onto it when an invalid formula is encountered.