algorand / indexer

searchable history and current state
MIT License
115 stars 90 forks source link

Indexer could use smaller hash indexes instead of btree indexes #890

Open urtho opened 2 years ago

urtho commented 2 years ago

Problem

All indexer indices are btree but not all have to be.

Solution

Replace btree indexes with hash indexes in places where only exact searches are used and the resulting hash index is significantly smaller.

One such example could be :

ledgerdb=# create index concurrently txn_by_tixid_hash on txn using hash (txid);
CREATE INDEX

ledgerdb=# \di+
                                         List of relations
 Schema |             Name              | Type  | Owner |       Table       |  Size   | Description
--------+-------------------------------+-------+-------+-------------------+---------+-------------
 public | txn_by_tixid                  | index | algo  | txn               | 43 GB   |
 public | txn_by_tixid_hash             | index | algo  | txn               | 16 GB   |

Urgency

Performance optimization.

oroechimaru commented 2 years ago

Hash or nonclustered may be good for mixed data sets, if clustering/ordering lookups on common keys are not needed: https://postgrespro.com/blog/pgsql/4161321

Do you have an example proc, table and index to look into?

I am not up to speed on postgres, but in general if page or row or columnstore compression can be leveraged, that would also reduce disk utilization depending on the index type.

If summarized results are needed clustered columnstore type indexing may offer more compression for summary performance but poor insert/update maintenance, although for large data warehouses they can also help compress wide complex data sets.