aadl / summergame

Code that runs the AADL Summer Game
http://play.aadl.org
GNU General Public License v2.0
5 stars 2 forks source link

Sort on large ledgers #85

Open eby opened 1 year ago

eby commented 1 year ago

I think it might be this code:

https://github.com/aadl/summergame/blob/2497a4bcbe46f2784f598ca32a4f11a08f09cae8/src/Controller/PlayerController.php#L561

But I’ve seen a few unbounded ledger listing where it is pid limited and just sorted by timestamp. The pid index is working properly and limiting the ledger but if a player is very active that becomes a filesort. Not sure if a different index is needed or if it would be better if we required a game term. I think the latter might be better and players just need to choose a year.

An example that ends up almost a million rows to sort:

explain extended SELECT * FROM sg_ledger WHERE pid = '902'  ORDER BY timestamp DESC;
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
| id | select_type | table     | partitions  | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                                 |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
|  1 | SIMPLE      | sg_ledger | p0,p1,p2,p3 | ref  | pid           | pid  | 4       | const | 971584 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
eby commented 1 year ago

Actually I take it back. I think this might be called from:

https://github.com/aadl/summergame/blob/2497a4bcbe46f2784f598ca32a4f11a08f09cae8/summergame.module#L741

When the player page is loaded it does a call to this function without term or game.

So we may need to:

eby commented 1 year ago

From glancing at the code it looks like maybe the sort is only used to get the earliest of the game term. If that is the case then maybe we can change the sort to use lid instead. Since that is a primary key that is numeric, it avoids the filesort that the date varchar creates.

Since lid is autoincrement then sorting by it should give the same results as sorting by date.

ejk commented 1 year ago

The timestamp field is a unix timestamp stored in an unsigned int(10) same as lid, so I'm not sure how changing the sort field might improve things.

If there's another index that we can throw in there that would help, that would be great. Otherwise I can look at that summergame_get_player_points function and see if I can break up any of the queries into smaller, faster ones.

eby commented 1 year ago

It wasn't speed as much as memory/CPU usage as with the primary/unique key, MySQL seems to do the sorting easier (without a filesort). We have lots of memory, so this isn't usually noticed, but I saw when we had some contention that this query was creating temporary memory tables a lot.

So, I was just passing this on as this is a few second query and then a loop over the data which delays rendering to the player. Some then hammer the refresh button causing disconnected queries.

mysql> explain extended SELECT * FROM sg_ledger WHERE pid = '301' ORDER BY timestamp DESC;
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
| id | select_type | table     | partitions  | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                                 |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
|  1 | SIMPLE      | sg_ledger | p0,p1,p2,p3 | ref  | pid           | pid  | 4       | const | 630260 |   100.00 | Using index condition; Using filesort |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------+
mysql> explain extended SELECT * FROM sg_ledger WHERE pid = '301' ORDER BY lid DESC;
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+-------------+
| id | select_type | table     | partitions  | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | sg_ledger | p0,p1,p2,p3 | ref  | pid           | pid  | 4       | const | 630260 |   100.00 | Using where |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+-------------+

The only way to speed it up overall is probably refactor. But we would need to do some testing.

From the various queries I tried (including individual game_terms), the most complicated one seems to be fastest, though it definitely uses resources:

SELECT SUM(points),game_term,type FROM sg_ledger WHERE pid = '301' AND metadata not like 'leaderboard:no' GROUP BY game_term,type

But if using that means it can simplify the code, then it might be the best option. If that works, then we could work on an index that removes some of the overhead.

mysql> explain extended SELECT SUM(points),game_term,type FROM sg_ledger WHERE pid = '301' AND metadata not like 'leaderboard:no' GROUP BY game_term,type;
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table     | partitions  | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                                                               |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | sg_ledger | p0,p1,p2,p3 | ref  | pid,code_text | pid  | 4       | const | 631774 |    88.89 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+-----------+-------------+------+---------------+------+---------+-------+--------+----------+---------------------------------------------------------------------+
eby commented 1 year ago

And just to be clear, this page isn’t causing an overall issue. I just noticed that the My Player page isn’t working well for our players that have extensive history.

I don’t remember if we allow points from far ago to be redeemed still but if not then it might be worth abstracting a totals table or something that holds totals vs recalculating on every My Player page load.