Should we make FIELD_ELEMENTS_PER_BLOB runtime-configurable?

asn-d6 commented 12 months ago

Hello all,

we are looking for some feedback on the recent PRs on making FIELD_ELEMENTS_PER_BLOB runtime configurable instead of a compile-time constant. You can see the core PR at https://github.com/ethereum/c-kzg-4844/pull/352, and here is a branch with 4/7 binding sub-PRs merged.

c-kzg was architected with FIELD_ELEMENTS_PER_BLOB being a compile-time constant from the beginning, but this summer lighthouse informed us that they have trouble managing build scripts to build c-kzg both in mainnet and minimal configurations, which we addressed with a slightly-hacky-but-sound approach (https://github.com/ethereum/c-kzg-4844/pull/317).

A month ago, after more complaints by the lighthouse team, we started experimenting with PRs to make FIELD_ELEMENTS_PER_BLOB completely runtime configurable, which are the PRs you see above.

We are currently internally discussing whether we should merge these PRs or just keep the status quo. We know it's a nice-to-have feature, but we would like your opinion on how much this feature would improve your lives, as there is no other reason to merge it other than client dev UX.

Positives:

Makes it more convenient to build and test c-kzg for mainnet/minimal presets
Rust: makes kzg code in lighthouse simpler
Rust: we can easily publish a single rust crate (it's [possible](https://github.com/ethereum/c-kzg-4844/pull/352#issuecomment-1713376826 now too but hacky)
Minor: Reduce the amount of heavy 128kb blobs on the stack (our sizes are not a problem for modern computers)

Negatives:

Requires sizeable refactoring of the main library and all bindings quite late in the post-audit lifetime
Slightly more complicated memory management because of dynamic heap usage instead of static stack usage
Additional length checks required on many bindings because blobs now are variable-sized
Slightly more complicated API for verify_blob_kzg_proof_batch() (takes flattened array of blob bytes, instead of array of blobs)

As downstream users of the library, you are likely not experiencing much of the negatives, but we are interested in learning whether this change will improve things for you both in terms of build scripts and also in terms of actual code.

Thanks!

/cc @GottfriedHerold @ppopth @matthewkeil @StefanBratanov @flcl42 @pawanjay176 @jangko @potuz @g11tech @henridf @divagant-martian

StefanBratanov commented 12 months ago

For Teku, it will simplify the java bindings (no need of two libs) and allows us to run the consensus reference tests without doing too many hacky changes. So overall I am happy with this change. Is there a possibility of doing another audit after this change?

g11tech commented 11 months ago

it would definitely be amazing for supporting minimal setups :+1:

jangko commented 11 months ago

Nim is flexible enough to handle multiple configurations thanks to it's powerful metaprogramming. But this changes clearly will make Nimbus code cleaner.

pawanjay176 commented 11 months ago

Super helpful in lighthouse as well since we don't have to resort to importing the same library twice

ppopth commented 11 months ago

Minor: Reduce the amount of heavy 128kb blobs on the stack (our sizes are not a problem for modern computers)

Slightly more complicated memory management because of dynamic heap usage instead of static stack usage

I don't think these two sentences make sense. Whether the blobs are in the stack or the heap doesn't depend on the library definition, but depends on the library users and bindings. For example, if I have a definition of

typedef struct {
    uint8_t bytes[BYTES_PER_BLOB];
} Blob;

It doesn't mean immediately that the blobs will always be on the stack. If the user wants it to be on the stack, they can do so.

Blob blob;

If the user wants it to be on the heap, they can also do so.

Blob *blob = malloc(sizeof(Blob));

The same reason can be applied to the new blob definition. (uint8_t *blob)

jtraglia commented 11 months ago

@ppopth you're right. With our current definition of Blob, you can allocate a blob on the stack or heap. I think this point stems from the Rust bindings, where the Blob definition being used (see below) is difficult to directly allocate on the heap. They needed to do something like this to accommodate.

#[repr(C)]
pub struct Blob {
    bytes: [u8; BYTES_PER_BLOB],
}

In C, you will be able to allocate a blob on the stack with the new definition like this:

uint8_t blob[s->bytes_per_blob];

But I'm not a huge fan of this and I think it might have some portability issues.

GottfriedHerold commented 11 months ago

In C, you will be able to allocate a blob on the stack with the new definition like this:
uint8_t blob[s->bytes_per_blob];
But I'm not a huge fan of this and I think it might have some portability issues.

Don't do this. Not only does this indeed have portability issues: Automatic storage duration (i.e. in practise stack-allocated) VLAs are optional even in C23 (Note VLAs as types are fine, e.g. as argument types. This still can take malloc-allocated or fixed-size array from callers). Furthermore, the behaviour/error handling when running out of stack-memory is generally not robust: You cannot portably guard against this beforehand, I don't think you can catch it and if you are unlucky, it's a security issue (if mem_needed > stack_availiable + guard_page_size, assume bad things may happen unless you can prove otherwise).

jtraglia commented 11 months ago

This is no longer necessary. As of #377, mainnet & minimal use the same trusted setup.

ethereum / c-kzg-4844

Should we make FIELD_ELEMENTS_PER_BLOB runtime-configurable? #369