Closed adamcharnock closed 3 months ago
@PetrDlouhy – I'm not entirely sure why django-sql-utils
is no longer providing a performance boost, so please let me know if I am missing something obvious.
@adamcharnock The django-sql-utils
didn't reduce number of DB queries, but made them faster. The SubquerySum
does get the related records from DB in subquery rather than by joining tables, which could be sometimes faster.
This would need to be tested on large dataset to show the speed benefit. If you want, I can dive deep and provide timing data for my database.
Ah ok, that makes sense. I've created the create_benchmark_transactions
command, and with that I have created 300k transactions / 600k legs. With those records I get a load time of around 350ms on the Transactions list admin page.
What kind numbers of transactions/legs are you dealing with on your side?
Update: Just seen you comment on another PR:
Ours service has ~1M users, (x2 Hordak accounts), ~13M transactions (x2 legs). Some of the accounts have much larger number of accounts than others, I would expect that maximum can reach 1M transactions per accont. But I had to solve the performance issues few years back, so I expect at ~10x less data.
We are using PostgreSQL, so PSQL only is not a problem for us. I am not sure, if DB functions would by fast enough, but I can do some tests.
So, given the above, it looks like I need to scale up by a couple orders of magnitude. I'll come back to this and give it some more thought.
Ok, I've scaled up by a factor of 10x and what did take around 300ms now takes around 3 seconds. So at your 100x scale this is certainly an issue.
I'll see if I can figure out something.
Hi @PetrDlouhy, would you have a moment to run this query against your database?
SELECT
T.*,
jsonb_agg(jsonb_build_object('amount', L.amount, 'currency', L.currency)) as balance
FROM
hordak_transaction T
INNER JOIN LATERAL (
SELECT SUM(amount) AS amount, amount_currency AS currency
FROM hordak_leg L
WHERE L.transaction_id = T.id AND L.amount > 0
GROUP BY amount_currency
) L ON True
GROUP BY T.id, T.uuid, T.timestamp, T.date, T.description;
I'm getting a result in 144ms on my benchmark DB (3M transactions, 6M legs) for 500 transactions. If I run the query over all 3M transactions it returns in around 4-5 seconds.
If this works well then I'm thinking this would make a good database view, which the admin listing could then pull from. We could also put any other useful aggregate fields in there too.
If I run it for 10,000 accounts it is super quick:
explain analyze SELECT
T.*,
jsonb_agg(jsonb_build_object('amount', L.amount, 'currency', L.currency)) as balance
FROM
hordak_transaction T
INNER JOIN LATERAL (
SELECT SUM(amount) AS amount, amount_currency AS currency
FROM hordak_leg L
WHERE L.transaction_id = T.id AND L.amount > 0
GROUP BY amount_currency
) L ON True
GROUP BY T.id, T.uuid, T.timestamp, T.date, T.description limit 10000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.81..47559.33 rows=10000 width=93) (actual time=0.070..77.373 rows=10000 loops=1)
-> GroupAggregate (cost=4.81..61231339.06 rows=12876028 width=93) (actual time=0.069..76.287 rows=10000 loops=1)
Group Key: t.id
-> Nested Loop (cost=4.81..61166958.92 rows=12876028 width=97) (actual time=0.040..43.418 rows=10001 loops=1)
-> Index Scan using hordak_transaction_pkey on hordak_transaction t (cost=0.09..282032.19 rows=12876028 width=61) (actual time=0.016..2.520 rows=10001 loops=1)
-> HashAggregate (cost=4.72..4.72 rows=1 width=36) (actual time=0.004..0.004 rows=1 loops=10001)
Group Key: l.amount_currency
Batches: 1 Memory Usage: 24kB
-> Index Scan using idx_hordak_leg_transaction_id on hordak_leg l (cost=0.09..4.70 rows=16 width=10) (actual time=0.002..0.003 rows=1 loops=10001)
Index Cond: (transaction_id = t.id)
Filter: (amount > '0'::numeric)
Rows Removed by Filter: 2
Planning Time: 0.232 ms
Execution Time: 78.118 ms
(14 rows)
If I run it for whole database, it is much slower:
explain analyze SELECT
T.*,
jsonb_agg(jsonb_build_object('amount', L.amount, 'currency', L.currency)) as balance
FROM
hordak_transaction T
INNER JOIN LATERAL (
SELECT SUM(amount) AS amount, amount_currency AS currency
FROM hordak_leg L
WHERE L.transaction_id = T.id AND L.amount > 0
GROUP BY amount_currency
) L ON True
GROUP BY T.id, T.uuid, T.timestamp, T.date, T.description;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=4.81..61231339.06 rows=12876028 width=93) (actual time=203.820..132172.565 rows=12916116 loops=1)
Group Key: t.id
-> Nested Loop (cost=4.81..61166958.92 rows=12876028 width=97) (actual time=203.788..88950.936 rows=12916116 loops=1)
-> Index Scan using hordak_transaction_pkey on hordak_transaction t (cost=0.09..282032.19 rows=12876028 width=61) (actual time=0.015..4760.019 rows=12916119 loops=1)
-> HashAggregate (cost=4.72..4.72 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=12916119)
Group Key: l.amount_currency
Batches: 1 Memory Usage: 24kB
-> Index Scan using idx_hordak_leg_transaction_id on hordak_leg l (cost=0.09..4.70 rows=16 width=10) (actual time=0.003..0.005 rows=1 loops=12916119)
Index Cond: (transaction_id = t.id)
Filter: (amount > '0'::numeric)
Rows Removed by Filter: 3
Planning Time: 0.191 ms
JIT:
Functions: 17
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.298 ms, Inlining 36.786 ms, Optimization 105.760 ms, Emission 61.021 ms, Total 205.866 ms
Execution Time: 132947.094 ms
(17 rows)
But I expect that is fine, at least for me. If I remember correctly, I don't need to access all balances at once. I usually only need balance for one account, the maximum is admin view with 100 accounts.
Great! And it sounds like it is mostly just scaling linearly with number of rows. 1000x (ish) times the rows takes 1000x (ish) times longer, which I feel OK about.
Ok, I've already got the migration in place locally to create the view. I'll get the admin pulling data from it then see how things look.
This is looking read now. Notes:
(Will update comment as I think of more)
Removing use of
django-sql-utils
(sql_util
) as the admin transaction listing UI is now completing in O(1) time complexity (see test).