SuffolkLITLab / docassemble-InterviewStats

A docassemble extension.
MIT License
1 stars 0 forks source link

Load interview stats directly into a dataframe? #21

Open nonprofittechy opened 1 year ago

nonprofittechy commented 1 year ago

I'm wondering if we could save some overhead by loading from the database directly into a dataframe. As the number of rows grows, loading all records from the database is going to fail.

https://hackersandslackers.com/connecting-pandas-to-a-sql-database-with-sqlalchemy/

BryceStevenWilley commented 1 year ago

Are you hitting the point of slow downs or memory pressure now? How many rows do you have?

https://github.com/SuffolkLITLab/docassemble-InterviewStats/pull/10 did a lot of performance improvements, and if I can recall correctly, for general operations, we could work with 100k rows pretty easily. The commit notes in that PR say that we got up to 400k rows before we had to put the excel generation into a background process.

Dataframes are loaded entirely into memory too, so if you're thinking to the point where simply loading all the rows will fail, the dataframe would start failing pretty quickly as well. Happy to do more performance work, but I'd rather not make a lot superfluous changes that don't really help, and we'd need specific things to try to improve (like memory pressure or speed), at specific data sizes.