IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
290 stars 160 forks source link

Slow queries (lack of indicies) #1410

Open Fell-x27 opened 1 year ago

Fell-x27 commented 1 year ago

OS Your OS: Ubuntu 22.04

Versions The db-sync version: 13.1.0.2 PostgreSQL version: 14

Build/Install Method The method you use to build or install cardano-db-sync: binaries

Run method The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): systemd

Problem Report I have discovered some slow queries in my logs after updating to version 13.1.0.2. These internal queries executed by db-sync are consistently taking about 5-7 seconds to complete:

SELECT "collateral_tx_in"."id"
FROM "collateral_tx_in"
WHERE "collateral_tx_in"."tx_in_id" >= 67772287
ORDER BY "collateral_tx_in"."id" ASC
LIMIT 1;

SELECT "collateral_tx_out"."id"
FROM "collateral_tx_out"
WHERE "collateral_tx_out"."tx_id" >= 67772287
ORDER BY "collateral_tx_out"."id" ASC
LIMIT 1;

SELECT "redeemer"."id"
FROM "redeemer"
WHERE "redeemer"."tx_id" >= 67772287
ORDER BY "redeemer"."id" ASC
LIMIT 1;

Of course, it runs slow for every tx_id, not just for the specific value of 67772287 :)

This issue is occurring because there are no indices on the columns used in these queries.

To address this problem, you can create the following indices:

-------Takes ~231MB
create index idx_collateral_tx_in_tx_in_id
    on collateral_tx_in (tx_in_id);

-------Takes ~31MB
create index collateral_tx_out_tx_id
    on collateral_tx_out (tx_id);

-------Takes ~282MB
create index redeemer_tx_id
    on redeemer (tx_id);

By creating these indices, the query execution time can be reduced significantly from 5-7 seconds to approximately 0.2 milliseconds.

kderme commented 1 year ago

This is expected behaviour. db-sync doesn't create and cannot create all possible indexes. You can add your own based on your queries.

Fell-x27 commented 1 year ago

You can add your own based on your queries.

But the queries mentioned above are not specific to my own queries. They are internal queries executed by db-sync. The purpose of posting this issue is to highlight the need for those indices within db-sync itself, as they are essential for optimizing its performance.

kderme commented 1 year ago

Oh I see then it's a different story indeed. I believe these queries are executed during rollback. Do you see them often?

Fell-x27 commented 1 year ago

Oh I see then it's a different story indeed. I believe these queries are executed during rollback. Do you see them often?

Often. Very often. As I can see, they are executing right after block insertion queries.

But, of course, they have disappeared after indices creation.

Fell-x27 commented 1 year ago

There is one more takes more than 5 sec:

SELECT "reference_tx_in"."id"
        FROM "reference_tx_in"
        WHERE "reference_tx_in"."tx_in_id" >= 69610494
        ORDER BY "reference_tx_in"."id" ASC
         LIMIT 1

With the same problem and solution.

-------63 MB
create index reference_tx_in_tx_in_id
    on reference_tx_in (tx_in_id);
kderme commented 1 year ago

I'm pretty sure these queries happen only during rollback. Rollbacks are not that frequent and spending a few seconds for them should be fine.