Open bushong1 opened 3 years ago
@bushong1 there might be some work from our side to fix this but not yet been confirmed. Also, I'd consider replacing Datapusher with Aircan but you'd need to create a new DAG for XLSX loading.
When I try to upload an XLSX-file the state remains "pending" forever, which is odd
It seems to me that the option is to replace messytables
dependency with its sucesor frictionless
We're also seeing that some .ods
files aren't processed well by messytables
, essentially causing OOM errors consuming >4G of memory. (among other reasons, it's doing zipfile extraction into memory, and potentially duplicating cells in rows many times to fill a large empty spreadsheet).
Any new on this issue ? I someone found a solution (like Aircan) ?
I've been using datapusher-plus in production. It has more active development and supports xlsx and ods.
So it looks like the dependency
messytables
usesxlrd
for excel file processing. The latestxlrd
does not support XLSX files anymore due to, as I understand it, security concerns.messytables
appears to be a dead project, not having had any activity in the last 2 years. This stack overflow post says thatxlrd
should be swapped out foropenpyxl
, but withmessytables
being unmaintained, that seems unlikely to happen. Is there any effort being taken to support XLSX files?