Open n8kim1 opened 6 months ago
Noting that #720 already alleviates this a lot by making there be less machines. If we can come up with a test plan to evaluate this before/after, we can deploy and evaluate what works better.
Noting also that scaling is already delayed by 1-2 mins because pub/sub metrics take time to propagate. So by the time scaling in happens, the scrim queue is actually far less than the assignment ratio.
about #720 -- Yes good catch (had thought about that but forgot to mention). For a plan... what if I watch the queue during the next tournament as-is, and we can evaluate how much scrimmage servers interrupted? (like how much the bad behavior is still present) I can also deploy this version and watch a tournament too
about delay -- good to know, thanks; That should help a ton for scaling in, and I can make scaling out a bit faster. Unfortunately relying on the baked-in 1-2 minute delay isn't probably long enough on its own though
To alleviate #605
This would slightly increase costs of year-round runs, especially when someone runs only one match in the random middle of the year. But those increases shouldn't be much anyways
Happy to tweak params, or just not do this anyways. I mainly made this PR just to close my tabs xp
https://cloud.google.com/compute/docs/autoscaler#scale-in_controls https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions#delays_in_scaling_in