ethereum / c-kzg-4844

A minimal implementation of the Polynomial Commitments API for EIP-4844 and EIP-7594, written in C.
Apache License 2.0
112 stars 105 forks source link

go-bindings: pass large arrays by ref instead of value #393

Closed holiman closed 6 months ago

holiman commented 7 months ago

This PR

NOTE: This means that this PR creates a breaking change, which requires a major version-upgrade of this library, strictly speaking. :warning:

The Blob type is an array, of 130K bytes. As opposed to a slice of similar size, when an array is passed by value, every byte is passed over the heap.

This is the old API:

func BlobToKZGCommitment(blob Blob) (KZGCommitment, error)
func ComputeKZGProof(blob Blob, zBytes Bytes32) (KZGProof, Bytes32, error)
func ComputeBlobKZGProof(blob Blob, commitmentBytes Bytes48) (KZGProof, error)
func VerifyKZGProof(commitmentBytes Bytes48, zBytes, yBytes Bytes32, proofBytes Bytes48) (bool, error)
func VerifyBlobKZGProof(blob Blob, commitmentBytes, proofBytes Bytes48) (bool, error)

Changed API in this PR:

func BlobToKZGCommitment(blob *Blob) (KZGCommitment, error)
func ComputeKZGProof(blob *Blob, zBytes Bytes32) (KZGProof, Bytes32, error)
func ComputeBlobKZGProof(blob *Blob, commitmentBytes Bytes48) (KZGProof, error)
func VerifyBlobKZGProof(blob *Blob, commitmentBytes, proofBytes Bytes48) (bool, error)

Untouched:

func LoadTrustedSetup(g1Bytes, g2Bytes []byte) error
func LoadTrustedSetupFile(trustedSetupFile string) error
func FreeTrustedSetup()
func VerifyBlobKZGProofBatch(blobs []Blob, commitmentsBytes, proofsBytes []Bytes48) (bool, error)
func VerifyKZGProof(commitmentBytes Bytes48, zBytes, yBytes Bytes32, proofBytes Bytes48) (bool, error)

Corresponding go-kzg PR: https://github.com/crate-crypto/go-kzg-4844/pull/63

name                                  old time/op    new time/op    delta
/BlobToKZGCommitment-8                   172ms ±45%     171ms ±50%      ~     (p=1.000 n=5+5)
/ComputeKZGProof-8                       174ms ±41%     139ms ±56%      ~     (p=0.421 n=5+5)
/ComputeBlobKZGProof-8                   205ms ±13%     192ms ±38%      ~     (p=0.548 n=5+5)
/VerifyKZGProof-8                       3.29ms ±58%    3.84ms ±50%      ~     (p=0.548 n=5+5)
/VerifyBlobKZGProof-8                   5.72ms ±46%    6.24ms ±39%      ~     (p=0.690 n=5+5)
/VerifyBlobKZGProofBatch(count=1)-8     8.47ms ± 0%    7.20ms ±49%      ~     (p=0.063 n=4+5)
/VerifyBlobKZGProofBatch(count=2)-8     13.9ms ± 1%    13.9ms ± 1%      ~     (p=0.905 n=5+4)
/VerifyBlobKZGProofBatch(count=4)-8     18.4ms ±34%    20.1ms ±51%      ~     (p=1.000 n=5+5)
/VerifyBlobKZGProofBatch(count=8)-8     29.6ms ±51%    31.6ms ±41%      ~     (p=0.690 n=5+5)
/VerifyBlobKZGProofBatch(count=16)-8    73.6ms ±49%    65.0ms ±26%      ~     (p=0.222 n=5+5)
/VerifyBlobKZGProofBatch(count=32)-8     126ms ±43%     133ms ±26%      ~     (p=1.000 n=5+5)
/VerifyBlobKZGProofBatch(count=64)-8     284ms ±36%     218ms ±52%      ~     (p=0.222 n=5+5)

name                                  old alloc/op   new alloc/op   delta
/BlobToKZGCommitment-8                   131kB ± 0%       0kB ± 0%   -99.96%  (p=0.008 n=5+5)
/ComputeKZGProof-8                       131kB ± 0%       0kB ± 0%   -99.94%  (p=0.008 n=5+5)
/ComputeBlobKZGProof-8                   131kB ± 0%       0kB ± 0%   -99.96%  (p=0.008 n=5+5)
/VerifyKZGProof-8                         161B ± 0%        1B ± 0%   -99.38%  (p=0.008 n=5+5)
/VerifyBlobKZGProof-8                    131kB ± 0%       0kB ± 0%  -100.00%  (p=0.008 n=5+5)
/VerifyBlobKZGProofBatch(count=1)-8      1.00B ± 0%     1.00B ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=2)-8      1.00B ± 0%     1.00B ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=4)-8      1.00B ± 0%     1.00B ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=8)-8      1.00B ± 0%     1.00B ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=16)-8     1.00B ± 0%     1.00B ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=32)-8     1.40B ±43%     1.00B ± 0%      ~     (p=0.444 n=5+5)
/VerifyBlobKZGProofBatch(count=64)-8     2.00B ± 0%     2.00B ± 0%      ~     (all equal)

name                                  old allocs/op  new allocs/op  delta
/BlobToKZGCommitment-8                    2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.008 n=5+5)
/ComputeKZGProof-8                        4.00 ± 0%      2.00 ± 0%   -50.00%  (p=0.008 n=5+5)
/ComputeBlobKZGProof-8                    3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.008 n=5+5)
/VerifyKZGProof-8                         5.00 ± 0%      1.00 ± 0%   -80.00%  (p=0.008 n=5+5)
/VerifyBlobKZGProof-8                     4.00 ± 0%      1.00 ± 0%   -75.00%  (p=0.008 n=5+5)
/VerifyBlobKZGProofBatch(count=1)-8       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=2)-8       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=4)-8       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=8)-8       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=16)-8      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=32)-8      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
/VerifyBlobKZGProofBatch(count=64)-8      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
jtraglia commented 7 months ago

Hey @holiman, I've actually been wanting to do something like this for a while. But I didn't want to break backwards compatibility and compatibility with go-kzg-4844. I would also support passing the other parameters by reference too.

In your benchmarks this appears to be noticeably slower though. Am I reading this correctly?

holiman commented 7 months ago

In your benchmarks this appears to be noticeably slower though. Am I reading this correctly?

I see the same, and I'm guessing it's a fluke. Those are the benches which run in c-land on separate threads, right? I guess my laptop was just busy, so corrupted the benchmarks. This one is also funky:

/BlobToKZGCommitment-8                   131kB ± 0%       0kB ± 0%      ~ 
holiman commented 7 months ago

As for breaking backwards compatibility:

Option 1: add new methods which takes by reference. Everyone happy. Option 2: perform a version upgrade from 0.4.2 to 1.0.0. This is the proper way (see e.g. https://go.dev/doc/modules/major-version )

jtraglia commented 7 months ago

Those are the benches which run in c-land on separate threads, right?

I don't believe the Cgo call happens in a different thread, but I'm not entirely sure how Go handles things under the hood. I have run the benchmarks myself and the performance difference is negligible.

This one is also funky.

Here, Go only tracks allocations in Go-land. It cannot track the C allocations.

Option 2: perform a version upgrade from 0.4.2 to 1.0.0. This is the proper way.

Yes, this is the approach we'd take.

namiloh commented 7 months ago

Here, Go only tracks allocations in Go-land. It cannot track the C allocations.

Right, but that is beside the point. The funky was that benchstat completely ignored a tangible improvement :)

asn-d6 commented 7 months ago

Thanks for the PR!

Two questions: 1) WRT changes the public interface methods that currently pass Blob by value, to instead pass *Blob, that is , by reference: What's the benefit of this change? I assume it's better heap hygiene, but I want to make sure I understand. Did this show up on a memory profiler? 2) I don't know enough Go to answer this but what's the deal with VerifyBlobKZGProofBatch() after this patch? I'm asking since that function would by far be the one causing the biggest heap damage (if the array contents are passed by value) as it takes multiple blobs as input.

holiman commented 7 months ago
  1. The benefit is heap hygiene. To elaborate a bit: a [100]byte is an array, that is, a fixed-size chunk of bytes. If a callsite passes such a thing to a function, it is entirely copied (effectively, it is immutable).

Example:

package main

import "fmt"

func foobar(data [10]byte) {
    data[0] = 13
    fmt.Printf("data is: %x\n", data)
}

func main() {
    var data [10]byte
    fmt.Printf("data is: %x\n", data)
    foobar(data)
    fmt.Printf("data is: %x\n", data)
}

yields

data is: 00000000000000000000
data is: 0d000000000000000000
data is: 00000000000000000000

So passing a 131K array from a function, to the next, to the next, over n calls, effectively causes n * 131K data to be allocated on the stack (not heap, sorry: but it doesn't matter really. It's memory, of one sort or another).

Anyway: as long as we're in Go, the compiler might be clever. It might see that it the call destination doesn't actually mutate the input array, so it can omit creating a pristine copy of the arguments. It can inline certain functions, thus avoiding the decision alltogether. But if it's going to take an array as input argument (and remember, it must not be mutated!) ,and pass the reference to it to c, it obviously must copy it, because it has no insight into what c is going to do with the data.

Sidenote: The same thing does not apply for slices. Passing a []data, where data is e.g. data = make([]byte, 131000) passes the slice. And a slice is really just a struct, containing three things:

A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment).

More info: https://go.dev/blog/slices-intro

And that answers question number 2: VerifyBlobKZGProofBatch(blobs []Blob, commitmentsBytes, proofsBytes []Bytes48) takes three slices. The size of a slice is 8 + 8 + 8 (8 bytes for a pointer, 8 each for an the two integers), so only 3 x 24 bytes on the stack.

So the go-kzg lib might be less bitten by these things than the c-kzg lib.

And yes, I noticed from a profile, when doing go test -run - -bench . --memprofile mem.out that the setting up of our benchmark triggered huge amounts of allocations

    155            .     6.82GB                   if err := kzg4844.VerifyBlobProof(sidecar.Blobs[i], sidecar.Commitments[i], sidecar.Proofs[i]); err != nil { 
namiloh commented 6 months ago

Sure no rush to merge, but technically, equivalence between go-kzg and c-kzg is not required, and, also already a lost cause, as go takes more inputs in some cases ( e.g. num-threads), and returns more values in some.

namiloh commented 6 months ago

Not sure though, maybe this PR takes the "by ref" too far. Small arrays are typically fine to pass be value, making them alloc-free since they're stack-allocated. In geth, we nearly alway pass common.Hash ([32]byte) by value.

Cc @karalabe ptal at the api changes desceibed above

asn-d6 commented 6 months ago

Not sure though, maybe this PR takes the "by ref" too far. Small arrays are typically fine to pass be value, making them alloc-free since they're stack-allocated. In geth, we nearly alway pass common.Hash ([32]byte) by value.

Cc @karalabe ptal at the api changes desceibed above

Hello again. Any thoughts here?

I agree that passing small arrays by value should not be an issue. In this regard, IMO we should optimize for the simplest and most pleasant API both for the library itself and its users (geth).

I guess the change implied here would be to only pass Blobs by reference, and pass everything else by value. WFM.

fjl commented 6 months ago

I agree with @asn-d6

holiman commented 6 months ago

Yeah we all agree it should be pointers for blobs (and any other large structs) but values for smallish things. I'll change it back

jtraglia commented 6 months ago

Sounds good to me too. Thanks @holiman.

holiman commented 6 months ago

Should be good to go now, IMO