dgorissen / pycel

A library for compiling excel spreadsheets to python code & visualizing them as a graph
GNU General Public License v3.0
565 stars 147 forks source link

Large excel files are slow to open #152

Open bauerjon opened 2 years ago

bauerjon commented 2 years ago

Hello!

I just wanted to first clarify that we'd be open to paying contributors to help optimize this for us. Please let me know if this is of interest. If not, we are looking for direction/suggestions on how to fix ourselves. Thank you! 🙏

What actually happened

When we try to open a large spreadsheet (20+ MB) using pycel it can take up to 8 minutes to startup. Many of the large sheets are static lookup tables.

What was expected to happen

Similar to opening a static csv in memory and doing a look up, I would expect opening a file this large with static data to open much faster. Ideally under 10 seconds so that we can iterate faster during development.

Problem description

When a large file like this takes this long to open, it makes iterating/making changes/debugging as a developer extremely painful.

Code Sample

https://github.com/bauerjon/slow-pycel-example

Environment

pycel==1.0b30 Python 3.9.5 Mac OS

bauerjon commented 2 years ago

I did find that at least half the slowness comes from the load_workbook calls

https://github.com/dgorissen/pycel/blob/f4fd7e5e9feb77e5affe9fd3b1881ef47861102c/src/pycel/excelwrapper.py#L243-L245

bauerjon commented 2 years ago

We added a fork/solution that seems to be working for our use case here