Closed yanokwa closed 2 years ago
Hi. I am experiencing the same issue but with a form that does not have (I think) blank columns or rows, however, it has colors across columns.
The slow file (143,961 bytes): www.qlands.com/other_files/spanish_english_slow.xlsx Converting to XML: 65.300415 seconds Converting to JSON: 61.0054 seconds
The fast file without any colors (182,741 bytes): http://www.qlands.com/other_files/spanish_english_fast.xlsx Converting to XML: 5.888324 seconds Converting to JSON: 0.522944 seconds
I am using 1.10.0
I noticed I was getting OOM on my 1GB Central instance errors (forcing me to power cycle to get it working again) when uploading a form I created in Libre Office. @lognaturel thought that sounds related to this issue and suggested I try creating a fresh version in Google Docs to compare, and it indeed seems to work fine. This zip contains the Libre Office and Google Docs XLSX files for comparison.
@qlands were your files created with LibreOffice as well? So far all reports of misbehaving files seem to be related to LibreOffice.
Software and hardware versions
pyxform v1.9
Problem description
UCL_Biomass_Plot_Form.xlsx when converted with pyxform-http (running in Docker on my computer) takes 27 seconds.
The form has lots of blank columns and rows and seems to be corrupt in some way.
I re-saved the form as XLSX in Excel 16.59 for Mac and that seems to have cleaned things up. Conversion speed went from to 27 seconds to .43 seconds.
Other information
https://forum.getodk.org/t/the-xlsform-conversion-service-could-not-be-contacted/37315/ is the forum thread with the original issue.
This sounds a lot like https://github.com/XLSForm/pyxform/pull/596, but I have confirmed I'm running 1.9. It's hard to verify directly with xls2xform because my computer is really fast :(
https://github.com/getodk/pyxform-http/blob/master/app/main.py#L40 is the specific pyxform-http function call.
My guess is that the form in question has the wrong dimensions. This script points to that.
Maybe if we see a survey sheet with more than 50 columns (2x what we'd expect) or 5000 rows, we should reset the dimensions?
https://openpyxl.readthedocs.io/en/2.5.14/optimized.html#worksheet-dimensions