geerlingguy / beast-challenge

A control system for MrBeast's 1-100 challenge
GNU General Public License v3.0
26 stars 1 forks source link

Leader App Vote API - Build and Test Scaling to millions of votes #2

Closed geerlingguy closed 1 year ago

geerlingguy commented 1 year ago

The voting API will be the core functionality of the leader-app—it needs to be able to accept up to thousands of votes per minute (potentially for many minutes!) and dump that information back out in real-time.

For this issue my goals are to:

geerlingguy commented 1 year ago

If we really wanted to go deep, we could implement monitoring: https://medium.com/flask-monitoringdashboard-turtorial/monitor-your-flask-web-application-automatically-with-flask-monitoring-dashboard-d8990676ce83

geerlingguy commented 1 year ago

My daughter said we should respond with 418 I'm a Teapot when voting is closed, so that's what I'm gonna do!

geerlingguy commented 1 year ago

Testing with curl:

curl -X POST http://127.0.0.1:5000/vote \
   -H 'Content-Type: application/json' \
   -d '{"room_id":3,"value":0}'
geerlingguy commented 1 year ago

We're getting 23 ms for a vote right now:

 20:17:46 ~ 
$ time curl -X POST http://127.0.0.1:5000/vote \
   -H 'Content-Type: application/json' \
   -d '{"room_id":3,"value":0}'

real    0.023
user    0.006
sys 0.008

But I would like to load test this a bit better (that's one thread, in development mode). Without debug mode on, I'm getting like 20ms. So not much difference there. Is uwsgi actually faster?

geerlingguy commented 1 year ago

Going to use wrk with a lua script and eventually have multiple randomized requests so it can generate realistically-randomized vote data: https://stackoverflow.com/a/68597094/100134

Adding a script in a load-testing folder so it is easy to reproduce (and eventually test on the NUC that will run this thing).

geerlingguy commented 1 year ago

Synthetic load test:

 20:39:56 beast-game/leader-app/load-testing 
$ wrk "http://127.0.0.1:5000/vote" -s wrk_vote.lua --latency -t 5 -c 20 -d 30s
Running 30s test @ http://127.0.0.1:5000/vote
  5 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    88.77ms  224.31ms   1.99s    90.68%
    Req/Sec   226.83    134.04   810.00     66.39%
  Latency Distribution
     50%    2.57ms
     75%   47.63ms
     90%  283.28ms
     99%    1.17s 
  16326 requests in 30.11s, 2.77MB read
Requests/sec:    542.29
Transfer/sec:     94.27KB

Doing good on my M2 MacBook Air, with the built-in server (not using WSGI). Did not test any other functionality at the same time (more realistic would involve randomized data, not the same request, and also some other script requesting a few other endpoints to get tally data, or room state data like lighting colors and LEDs).

geerlingguy commented 1 year ago

So, writes actually scale quite well. The reads are fast too, but when you need to iterate through a list of 50,000 votes like 100 times it gets a little slower, lol.

I need to optimize my code for the tally side a bit, but basically, I have it working with wrk and this lua script: https://github.com/geerlingguy/beast-game/blob/master/leader-app/load-testing/wrk_vote.lua

I tested multiple rounds at 5 threads and 10 concurrent connections, for 30 seconds each, and every time was able to sustain over 500 votes per second, with latency averaging 2-10 ms per request. If I just do 1 thread and 1 connection, I can hit 1,358 req/s with under 1ms latency for almost every request:

 15:11:44 beast-game/leader-app/load-testing 
$ wrk "http://127.0.0.1:5000/" -s wrk_vote.lua --latency -t 1 -c 1 -d 10s
Running 10s test @ http://127.0.0.1:5000/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   792.51us    1.46ms  21.97ms   98.37%
    Req/Sec     1.36k    83.73     1.47k    84.16%
  Latency Distribution
     50%  602.00us
     75%  644.00us
     90%  733.00us
     99%    8.45ms
  13720 requests in 10.10s, 2.33MB read
Requests/sec:   1358.43
Transfer/sec:    236.13KB

The one performance concern at this point is if we have a round where the goal is "hit the buttons as fast as possible" and then they let all 100 rooms do it for like an hour. At that point, the writes are still fast, but the tally page code (which is like O^3) starts bogging down to 200-400ms per page load, with extra latency on the database side...