Closed nonprofittechy closed 2 years ago
General scaling questions. I don't think we'll exceed millions ever and I want to see if we are likely to ever need background processing.
Took a quick look at this while I was thinking about it, and started making some quick changes in https://github.com/SuffolkLITLab/docassemble-InterviewStats/commit/3dd43202fe16517a1ef4ac545613eb4384438559. However, to make significant progress, we'll need to:
Maybe we can make a mirror of prod? I can get the postgres dump to you if you want. Right now prod struggles to display the stats for the CDC moratorium. I was trying to pull that one as an example [for our technical slides] but it didn't work.
Haha, that's why I was taking another look at this, I remember running into trouble last month with it. A prod-copy would be helpful, but seems really risky privacy-wise. I was just thinking an interview that calls store_variables_snapshot
a million times with fake / random data and to let it run locally for however long that takes (it should only take about an hour max?) The effort part would be making the data similar to what's stored in the CDC moratorium.
That makes sense, and would have additional uses for future stress tests.
Some specific numbers: we're timing out between 3000 and 5000 rows. However, I can't actually tell what's taking such a long time: when I bring down the number of rows such that it still takes ~20-30 seconds to load the map screen, the "show variables" button (now just looks like </>
) claims that it only took < 1 second to make the page. From the network tab, it doesn't look like it's something in Bokeh that making everything take so long, since everything hangs on the initial GET call to the DA server. I'm going to need to get back into the guts of the server and start adding logmessages everywhere, which will take more time than I thought it would.
I added some Pandas stuff to the help
area which might be to blame for at least some of the speed issues
I took out that help part, and we are extremely snappy on 47k rows now! Scaling up to see how much we can handle before timing out again.
Thanks for the tip @nonprofittechy, you saved me at least of day of deep diving.
Is this just general scaling issues, or any specific pain points that stand out?