This new function/overload records all the nodes being back-referenced to. This helps inform, e.g. tree_hash() which nodes' hashes should be cached.
This feature enables performance improvements in chia_rs. Those improvements are reported below.
Hashing speed-up: 5.0 - 5.85
The speed-up of hashing a whole block generator, of a full block (50% full given our current farmer fill rate). These are real blocks from mainnet, picked out during the period of high mempool pressure, and full blocks.
Generator speed-up: 1.26 - 1.88
The speed up of running the block generator of a compressed block using run_block_generator2(), i.e. with the rust implementation of the generator ROM. This is what enables the improved hashing of the puzzles.
This speed-up won't be realized on chain until after the hard-fork.
benchmarks, MacOS M1
run_generator2
when taking advantage of this information exported from the deserializer, we can optimize tree_hash() and also optimize the tree hashing in run_block_generator2().
Since the cache takes advantage of the block being "compressed" (i.e. using clvm backrefs), it won't impact uncompressed blocks.
This is where we spend time in the run_generator test. This is test case block-1ee588dc run in Linux, AMD Threadripper. The test runs both the regular run_block_generator() (which uses the CLVM implememtation of the ROM) as well as run_block_generator2(), with the rust implementation of the ROM.
This new function/overload records all the nodes being back-referenced to. This helps inform, e.g.
tree_hash()
which nodes' hashes should be cached.This feature enables performance improvements in
chia_rs
. Those improvements are reported below.Hashing speed-up: 5.0 - 5.85
The speed-up of hashing a whole block generator, of a full block (50% full given our current farmer fill rate). These are real blocks from mainnet, picked out during the period of high mempool pressure, and full blocks.
Generator speed-up: 1.26 - 1.88
The speed up of running the block generator of a compressed block using
run_block_generator2()
, i.e. with the rust implementation of the generator ROM. This is what enables the improved hashing of the puzzles.This speed-up won't be realized on chain until after the hard-fork.
benchmarks, MacOS M1
run_generator2
when taking advantage of this information exported from the deserializer, we can optimize
tree_hash()
and also optimize the tree hashing inrun_block_generator2()
.Since the cache takes advantage of the block being "compressed" (i.e. using clvm backrefs), it won't impact uncompressed blocks.
tree_hash_from_bytes
The
tree_hash_from_bytes()
function, when updated to use a cache, has the following benchmarks. Again, only the compressed CLVM would improve.profile
This is where we spend time in the
run_generator
test. This is test caseblock-1ee588dc
run in Linux, AMD Threadripper. The test runs both the regularrun_block_generator()
(which uses the CLVM implememtation of the ROM) as well asrun_block_generator2()
, with the rust implementation of the ROM.uncompressed blocks
For reference, here are the benchmarks for the uncompressed blocks. These are not expected to gain a boost: