Chia-Network / clvm_rs

Rust implementation of clvm
Apache License 2.0
67 stars 57 forks source link

support adding atoms 'zero-copy' #361

Open arvidn opened 8 months ago

arvidn commented 8 months ago

This adds a feature to the Allocator and its NodePtr where a NodePtr can point into a pre-existing, immutable, buffer. Currently, a NodePtr atom can only point into the (dynamically allocated) heap. The NodePtr representation has 6 bits to encode "type", which currently is either Bytes or Pair. This adds 3 additional types, indicating which external buffer the atom is allocated in.

The main rationale are to reduce memory footprint and avoid unnecessary memory copying when:

  1. parsing a program from an (immutable) buffer to run it
  2. passing in a (potentially) large blob as the argument to a program (e.g. a block reference when running a generator)

This patch doesn't in itself take us all the way there, but it adds the basic support for it in Allocator.

Lifetime annotation

With this change, Allocator needs a lifetime annotation, to ensure the buffers added to it (via add_external_buffer()) do in fact outlive the allocator itself. This lifetime annotation propagates to LazyNode, which contains a reference to an Allocator. The problem is that pyo3 cannot expose types with lifetime annotation to python.

This is solved by "laundering" the Allocator when creating a LazyNode, by turning it into an ImmutableAllocator, which simply does not contain any references to the external buffers, and will just fail if any such nodes are part of the LazyNode tree. (None of the intended use cases for external buffers return LazyNodes, so this shouldn't be a problem).

ObjectType

The main reason to hard code the indices in ExternalBytes0, ExternalBytes1 is to keep the enum a plain int, to support casting it in the shift-expression when constructing the underlying value of a NodePtr. I suspect that we can make this a little bit nicer by detaching the enum's representation from the NodePtr representation.

as_index()

As we add more object types to NodePtr, the purpose of as_index() is probably increasingly defeated. It was introduced as a cheap and efficient alternative to a hash-table with NodePtr as the key.

coveralls-official[bot] commented 8 months ago

Pull Request Test Coverage Report for Build 7507336535


Totals Coverage Status
Change from base Build 7505735168: -0.2%
Covered Lines: 5556
Relevant Lines: 5947

💛 - Coveralls
richardkiss commented 7 months ago

Have you considered creating a wrapping object PythonAllocator (or something) that has lifetime 'py? The idea is that the allocator now also has lifetime 'py and objects coming from python can have their refcounts bumped by jamming a clone so they are also 'py. You may not need the laundering allocator this way.

(That said, I've had lots of great ideas with lifetimes and python that end up not being feasible, so it's possible there are shortcomings here. Also wresting with lifetime error messages is sometimes just not worth it.)

arvidn commented 7 months ago

you mean having the allocator contain PyBuffer objects, essentially, to keep python buffers alive? I don't think there's a way to expose types with any lifetime annotations. But a wrapper that takes python buffers could possibly encapsulate the lifetime requirements.

One thing that would make me a little bit uneasy is that we can take both bytes and bytearray, and bytearray is mutable. So it's not just a matter of holding references.

arvidn commented 7 months ago

in its current form, this patch causes a performance regression. When testing on RPi5:

run_program/block-2000  time:   [366.63 ns 366.75 ns 366.87 ns]
                        change: [+1.0676% +1.2314% +1.4306%] (p = 0.00 < 0.05)
                        Performance has regressed.
run_program/compressed-2000
                        time:   [1.7944 s 1.7947 s 1.7951 s]
                        change: [+8.1965% +8.2266% +8.2580%] (p = 0.00 < 0.05)
                        Performance has regressed.
run_program/large-block time:   [501.45 ms 501.74 ms 502.17 ms]
                        change: [+5.4493% +5.5198% +5.6198%] (p = 0.00 < 0.05)
                        Performance has regressed.
arvidn commented 7 months ago

I think the main use case for this is block references being passed in to the generator. I will update this at some point with that use case in mind.

github-actions[bot] commented 5 months ago

'This PR has been flagged as stale due to no activity for over 60 days. It will not be automatically closed, but it has been given a stale-pr label and should be manually reviewed.'