XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
80 stars 136 forks source link

Form conversion is slow on form with lots of blank rows and columns #604

Closed yanokwa closed 2 years ago

yanokwa commented 2 years ago

Software and hardware versions

pyxform v1.9

Problem description

UCL_Biomass_Plot_Form.xlsx when converted with pyxform-http (running in Docker on my computer) takes 27 seconds.

The form has lots of blank columns and rows and seems to be corrupt in some way.

I re-saved the form as XLSX in Excel 16.59 for Mac and that seems to have cleaned things up. Conversion speed went from to 27 seconds to .43 seconds.

Other information

https://forum.getodk.org/t/the-xlsform-conversion-service-could-not-be-contacted/37315/ is the forum thread with the original issue.

This sounds a lot like https://github.com/XLSForm/pyxform/pull/596, but I have confirmed I'm running 1.9. It's hard to verify directly with xls2xform because my computer is really fast :(

https://github.com/getodk/pyxform-http/blob/master/app/main.py#L40 is the specific pyxform-http function call.

My guess is that the form in question has the wrong dimensions. This script points to that.

from openpyxl import load_workbook
wb = load_workbook(filename='UCL_Biomass_Plot_Form.xlsx', read_only=True)
ws = wb['survey']
print(ws.calculate_dimension()) # A1:AMJ1048576
ws.reset_dimensions()
print(ws.calculate_dimension(force=True)) # A1:AMJ91

Maybe if we see a survey sheet with more than 50 columns (2x what we'd expect) or 5000 rows, we should reset the dimensions?

https://openpyxl.readthedocs.io/en/2.5.14/optimized.html#worksheet-dimensions

qlands commented 2 years ago

Hi. I am experiencing the same issue but with a form that does not have (I think) blank columns or rows, however, it has colors across columns.

The slow file (143,961 bytes): www.qlands.com/other_files/spanish_english_slow.xlsx Converting to XML: 65.300415 seconds Converting to JSON: 61.0054 seconds

The fast file without any colors (182,741 bytes): http://www.qlands.com/other_files/spanish_english_fast.xlsx Converting to XML: 5.888324 seconds Converting to JSON: 0.522944 seconds

I am using 1.10.0

seadowg commented 2 years ago

I noticed I was getting OOM on my 1GB Central instance errors (forcing me to power cycle to get it working again) when uploading a form I created in Libre Office. @lognaturel thought that sounds related to this issue and suggested I try creating a fresh version in Google Docs to compare, and it indeed seems to work fine. This zip contains the Libre Office and Google Docs XLSX files for comparison.

lognaturel commented 2 years ago

@qlands were your files created with LibreOffice as well? So far all reports of misbehaving files seem to be related to LibreOffice.