We have some statistics that are derived from the entire block history, namely transaction count and payload size. Previously, these were computed on the fly each request, which requires an expensive full table scan and, in the case of payload size, a large sum. We saw this causing performance problems on decaf and mainnet, particularly the block explorer, which uses these statistics heavily
This PR:
Adds a new table aggregate, which stores these cumulative values at each block height. This table is kept up to date by a background task scanning the block stream. Now, looking up the values of these statistics just requires reading a single row in this table. This approach has numerous benefits in addition to a massive performance improvement:
we can now easily look up the values of these statistics at any historical block height, or for any historical range of blocks
instead of returning inaccurate counts when data is missing, we will simply return that we don't have the counts yet for the requested block height, since the aggregate table won't be populated for that row
requests that explicitly specify a range or upper bound are easily cachable
We have some statistics that are derived from the entire block history, namely transaction count and payload size. Previously, these were computed on the fly each request, which requires an expensive full table scan and, in the case of payload size, a large sum. We saw this causing performance problems on decaf and mainnet, particularly the block explorer, which uses these statistics heavily
This PR:
Adds a new table
aggregate
, which stores these cumulative values at each block height. This table is kept up to date by a background task scanning the block stream. Now, looking up the values of these statistics just requires reading a single row in this table. This approach has numerous benefits in addition to a massive performance improvement: