Closed arvidn closed 8 months ago
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
src/chia_dialect.rs | 13 | 14 | 92.86% | ||
src/op_utils.rs | 35 | 36 | 97.22% | ||
src/run_program.rs | 9 | 10 | 90.0% | ||
src/traverse_path.rs | 79 | 80 | 98.75% | ||
src/allocator.rs | 309 | 311 | 99.36% | ||
src/runtime_dialect.rs | 0 | 6 | 0.0% | ||
<!-- | Total: | 503 | 515 | 97.67% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
src/allocator.rs | 1 | 98.91% | ||
src/run_program.rs | 1 | 97.2% | ||
src/runtime_dialect.rs | 3 | 0.0% | ||
<!-- | Total: | 5 | --> |
Totals | |
---|---|
Change from base Build 7833713031: | 0.2% |
Covered Lines: | 5744 |
Relevant Lines: | 6081 |
This is interesting, do the bench times increase in cases where most of the atoms are large due to the extra branching to determine the type of the NodePtr?
I'm also concerned about returning a reference to a temporary value with atom()
, this feels like a technically memory safe, but logically unsafe thing to do.
But otherwise I like this a lot
This is interesting, do the bench times increase in cases where most of the atoms are large due to the extra branching to determine the type of the NodePtr?
I think there's probably more noise in those timings than one might imagine. I just basically ran it once. But, yeah, there are a few cases where there's more work done. For example, when asking whether a node is a small_int()
, I actually construct a small int, if it doesn't use the internal representation but would fit. This is probably not necessary.
There's also some extra work when allocating a new number. There's a check to see if it could be stored as a small int, and if so, it is.
I'm also concerned about returning a reference to a temporary value with atom(), this feels like a technically memory safe, but logically unsafe thing to do.
Yeah, I'm not super happy about this. I think a ring buffer of 16 is more than enough, but maybe it's better to overshoot by more and make it 64 or so.
In the long run I would like to phase out this part of the API though, in favor of visit_node()
(and the other getters)
This is still work in progress, as you can see from the TODO comments and missing tests
If I understand correctly, this is actually a SmallInt
. Maybe we should call it that or SmallAtomInt
?
This is some good work. The numbers in particular look fantastic. I do worry about subtle consensus changes and wonder if adding more visit_node_as_XXX
functions would simplify and alleviate concerns that SmallAtom
is always correctly coded whenever it's possible.
@Rigidity pointed out that visit_node()
could probably return the NodeVisitor
, rather than taking a callback. Doing that sounds like an improvement to me. I'll investigate that.
I applied the suggestion from @Rigidity and made visit_node()
return the NodeVisitor
rather than taking a callback function to pass it to. I also renamed the function just node()
.
The change since last version: https://github.com/Chia-Network/clvm_rs/compare/34bbdf08f7d0c59fa3312b66aa1d3173839b6866..e90cd1279ae60da5f21953486c3a454f583ca217
These commits are best reviewed one at a time.
purpose
This patch introduces a more compact representation of small atoms. A
NodePtr
is 32 bits, 6 bits identifies the kind of node it points to (Atom or Pair), 26 bits is an index into the vector of atoms or pairs.This patch adds a 3rd kind of node,
SmallAtom
, where the value of the atom is stored in the low 26 bits directly. The benefits are:A very large number of atoms are small. In a typical CLVM program for instance, all op-codes and environment paths are small.
format
SmallAtom
only support atoms that are:The 26 bits are interpreted as an unsigned integer.
leading zeros
Since our canonical integer representation requires leading zero bytes for certain positive integers (where the most significant bit in the following byte is set), the leading zero is implied by
SmallAtom
nodes. The functionlen_for_value()
inallocator.rs
implements this behavior. It returns the length an atom would have had, if it has been allocated on the heap.Allocator::nil(), Allocator::one()
The Allocator always allocate two nodes on construction,
nil
andone
. With this patch, we no longer need to allocate these in the atom list nor on the heap. We can just constructNodePtr
representing those values directly, asSmallAtom
. In order to still enforce the same limit on the number of atoms that are allowed to be allocated by a program, we subtract 2 from the constant,MAX_NUM_ATOMS
.MAX_NUM_ATOMS
We have a (consensus critical) rule on how many atoms are allowed to be allocated by a program. Since
SmallAtom
s aren't allocated in theatom_vec
list, we need to keep a count of them separately, to ensure the limit is identical as prior to this optimization. We keep a countersmall_atoms
counting the number of small atoms we've allocated. Whenever we check theMAX_NUM_ATOMS
limit, we include this counter.atom() -> &[u8]
A major challenge with this optimization is in the Allocator's interface. It lets you ask for a pointer to the buffer of any atom. Small atoms don't have a buffer since they're not allocated on the heap. In order to stick to the current interface of the allocator, this function is preserved, using a pool of small buffers to return a pointer to.
We have a ring buffer,
temp_vec
, with a cursor to the next free slow,temp_idx
. When theatom()
is requested for a node that is aSmallAtom
, we grab the next available slot in the ring buffer, increment the cursor, print the integer into the buffer and returns a reference to it.This relies on the return values from
atom()
being short lived, and that we never, at one point, keep more than 64 of these references around.visit_node()
In order to optimize functions using integers, there's a new function on
Allocator
calledvisit_node()
, which calls a callback with the value of the node pass into it. This means we can avoid conversions from small integers toBigInt
for example.benchmarks
The results of the benchmarks in this repository follow. This is the output from
critcmp
:RPi5
Perhaps the more interesting benchmarks are the
run_generator
stress tests in thechia_rs
repository. I ran it on Mac OS (M1) and a RaspberryPi 5 machine.My main concern was to make the worst case more palatable. Among our stress tests, that's
deep-recursion-plus
andduplicate-coin-announce
.RPi5
Mac OS (M1)