CarnegieLearningWeb / UpGrade

Framework for adding A/B testing to education applications
https://www.upgradeplatform.org/
BSD 3-Clause "New" or "Revised" License
26 stars 13 forks source link

Optimize query for post rule #909

Open amurphy-cl opened 1 year ago

amurphy-cl commented 1 year ago

Describe the bug Completed experiments are still providing network traffic by querying the post-experiment rule. We noticed that this throttled UpGrade 4.x performance in prod, so the temporary solution was to delete the completed experiments. In the future, we'll want to archive rather than delete many if not most completed experiments (queries are not conducted on archived experiments). One solution for optimization is to disable queries when the post rule is set to "default".

danoswaltCL commented 1 year ago

I have had trouble getting Playpower usable production db code to try and duplicate this, so I'm putting this issue on the board for "cycle 2" to try and create a reproducible example for this that can be shared.

danoswaltCL commented 1 year ago

@ppratikcr7 I was able to load the prod clone sql file into empty local db and then start up backend services and use it normally, I don't know if the user 'readonly' error really matters?

I tried briefly to recreate the bug under locust load tests but I think it will require some more thought about how to reproduce, I will need to get used to how to modify these locust python scripts to get what we want.

I believe what might work is to find a stable baseline under load-testing with all experiments enrolling, then try setting experiments to "enrollment-complete" and see if anything starts to change in the response-times. That is about as much as I know about this bug at this point, I wish I had the exact data from when it happened, will try to catch it in the act if it happens again.

ppratikcr7 commented 1 year ago

@danoswaltCL Hey, thanks for checking. I will try again to import it. For me, after taking about an hour or so, the error showed up and there was no data in the empty db I had created.

Yeah, right. We need to try load testing scripts as per above. Do let me know in case you need help with the script changes.

danoswaltCL commented 10 months ago

QA: Dan

amurphy-cl commented 9 months ago

Reopening this so we can validate in prod, tentatively scheduled for 01/02/2024

ppratikcr7 commented 8 months ago

@danoswaltCL We removed the PR changes for this ticket in the PR for the competing experiments pool logic. This wont have much latency issue now, as we use cached experiment list. Do we still want to QA this and keep the ticket open?