XLSForm / pyxform

A Python package to create XForms for ODK Collect.
BSD 2-Clause "Simplified" License
77 stars 134 forks source link

For some forms, v1.8 uses an unacceptable amount of memory #595

Closed lognaturel closed 2 years ago

lognaturel commented 2 years ago

A 200kb Excel doc was measured using 1.5GB of memory.

The most likely culprit is openpyxl, introduced in #575 by @sheppard

From https://openpyxl.readthedocs.io/en/stable/performance.html:

Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file.

What we're seeing is higher by two orders of magnitude.

lognaturel commented 2 years ago

openpyxl read only mode appears to bring mem usage back to v1.7 levels. Hurray for an easy fix. 😮‍💨

lognaturel commented 2 years ago

Looks like it's a specific form running into something like what this post describes: https://stackoverflow.com/questions/47582274/iterate-through-columns-in-read-only-workbook-in-openpyxl

There are memory gains to using read-only mode so we should still do it but this is not as critical as it originally seemed.

lognaturel commented 2 years ago

For whatever reason, the specific form this was seen on had a huge number of extra columns. Deleting those makes the form convert quickly with minimal memory usage with v1.8.0. I still would like to do #596 but it's not critical. CC @aurdipas

lognaturel commented 2 years ago

More users have been reporting this so grateful for your help getting it addressed, @lindsay-stevens 🚀