IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
283 stars 158 forks source link

1728 - stake address cache #1731

Closed Cmdv closed 2 weeks ago

Cmdv commented 3 weeks ago

Description

This fix introduces a reduction in Select queries to the stake address database and instead do them locally in an LRU cache with a size of 1,500,000 entries. This turned out to be a good size to keep RAM usage low yet have a fairly high hit rate. Whilst there I also increased the MultiAsset LRU cache capacity to 250000 which has resulted in.

epoch taking around 40 mins to complete syncing down to 10-15mins!!!

number of queries Before After
stake address 2,748,224 66,834
multi asset 2,180,423 1,131,696

download

New Statistics

Cache Statistics:
  Stake Addresses: cache size: 1500000, hit rate: 98%, hits: 3135062, misses: 61260
  Pools: cache size: 3128, hit rate: 99%, hits: 2485150, misses: 3128
  Datums: cache capacity: 250000, cache size: 75820, hit rate: 40%, hits: 87835, misses: 129899
  Multi Assets: cache capacity: 250000, cache size: 250000, hit rate: 73%, hits: 3042122, misses: 1099061
  Previous Block: hit rate: 49%, hits: 21001, misses: 21003

The previous cache used which is a Map has a cache size: 1,273,020 so there is potential of increasing size of new cache.

When db-sync is restarted the cache will be populated with the last inserted stake address into the db (using the chosen capacity of the cache).

All newly inserted stake address are inserted into the cache

This fixes #1728

kderme commented 3 weeks ago

These are good results. Are they from runing an epoch? I was wondering if 15000000 is enough for stake_addresses at the tip of the chain (I'll do a full sync so I may have this result). We want to avoid cases that stake_addresses that receive rewards and are used very frequently are evicted from the cache because it's small. Possibly we could also adapt the size of the cache dynamically.

Some nitpicks, which may give us some insight: For stake addresses, misses: 61260 but queries 66.834 and for multiassets, misses: 1099061 and queries: 1,131,696. Do we know which misses we're missing :)?

Cmdv commented 3 weeks ago

@kderme With Multi Asset cache all I did was to up the cache capacity from 50000 to 250000. I've not actually investigating what is happening.

In regards to misses, I have a feeling it's not quite working as expected will double check they are all being marked correctly. As you say the misses and the number of queries the db report should be the same. 5,574 seem to be not marked from the db-sync side 🤔