Blef-team / blef_game_engine

The game engine API service for the game of Blef
GNU Affero General Public License v3.0
1 stars 0 forks source link

Increase engine service capacity #160

Closed maciej-pomykala closed 3 years ago

maciej-pomykala commented 4 years ago

The current game engine service (as deployed on our t2.micro instance) is capable of serving somewhere between 60 and 100 requests a second to the state endpoint, which in practice means between 60 and 100 users. This should be increased.

First of all, the state endpoint is the target of >90% of requests under our current practice of having clients query the game state every second, as it usually takes more than 10 seconds inbetween a specific player making their consecutive moves. Therefore, we suspect (not empirically confirmed) that the state endpoint consumes much more CPU time than all other endpoints combined.

Furthermore, the vast majority of these game state requests get exactly the same response as the previous response to the same IP. In other words, in the vast majority of cases, the game state has not changed inbetween two consecutive requests from one user.

There are many ways to speed the engine service up. We should try some of them.

maciej-pomykala commented 4 years ago

When the state endpoint is called, CPU is used on broadly these things:

We do not know yet which of these consume more CPU time. We have run some basic experiments with submitting large batches of queries to actual game endpoints or a simple experimental API and obtained some basic findings - for example, an endpoint (on the EC2 instance) which returns the round number in a selected game without any input validation (i.e. one line of code) takes just around 2ms, instead of 10-15ms, of CPU time.

This suggests that either input validation or preparing the game state for sending is consuming most of the CPU time.

Therefore, it seems the API can be sped up by a factor of several within the Plumber framework if we employ a simple endpoint that will return one value to inform users whether their knowledge of the game state is out of date. Two ideas are:

maciej-pomykala commented 4 years ago

Ultimately, every of the four phases of the game state query process could be sped up by rewriting the API in a different language. However, we're realistically first going to grab low-hanging fruits which we can already see and which do not require a change of the language.

maciej-pomykala commented 4 years ago

In our discussions, one idea for reducing the CPU time required for preparing the R object to be sent as JSON is to store the game state as a JSON file.

There could be a version of the state for each player (and one for observer) prepared in advance and just sent over upon request. This will still require input validation.

maciej-pomykala commented 4 years ago

It's worth looking at whether the game object can be saved and read faster than it is now.

An experiment where a realistic game object is read and written a large number of times using the saveRDS, saveRDS with compress = FALSE and qsave functions for writing and readRDS and qread functions for reading on a computer with an SSD drive and the Intel i7-5500U processor shows that, regardless of function used, writing the game state file requires around 2ms with little variation whereas reading requires 0.6ms for readRDS (regardless whether it's reading from a compressed or uncompressed file) and 0.9ms for qread.

We cna plausibly get to a point where reading the file will be our bottleneck. It is not clear to me whether there is an easy way to speed this up. Maybe storing recent games in RAM?

maciej-pomykala commented 4 years ago

To reiterate, currently input validation and preparing the game state for sending requires more CPU time than reading the game state file and handling the networking element of the game state request.

Code profiling is needed to establish how input validation relates to preparing the state for sending in terms of CPU consumption.

adrian-golian commented 4 years ago

Verbose AF :) Just do this: https://stackoverflow.com/a/7254472 And close #161

adrian-golian commented 4 years ago

For comparing various alternative function implementations: library(microbenchmarking)

maciej-pomykala commented 3 years ago

Fixed with #169