Generalized Merkle Tree Index: use 0-based indexing

mratsim commented 5 years ago

The current specs for generalized merkle tree uses 1-based indexing (https://github.com/ethereum/eth2.0-specs/blob/2787fea5feb8d5977ebee7c578c5d835cff6dc21/specs/light_client/merkle_proofs.md#generalized-merkle-tree-index)

The differences between 0 and 1 based indexing are the following:

	root at 0	root at 1
Left child	i*2 + 1	i*2
Right child	i*2 + 2	i*2 + 1
Parent	(i-1)/2	i/2

Convention

Using 0-based indexing would be using the common convention of most programming languages (save Fortran, Matlab and Julia).

Errors

It is well known that mixing 0-based and 1-based indexing tends to create off-by-one errors, we already had several such spec bugs in the past. And using 1-based indexing is introducing new risks.

Memory saving

I expect clients to allocate binary trees with power-of-2 sizes for efficiency reasons. If we have N nodes in the merkle tree where N is a power of 2 we will waste N-1 node space.

Concrete example: 1 byte can store 256 values in the range [0, 255] Assuming we have values in range [1, 256] instead like the current proposition we would need 2 bytes of storage and will waste the [257-511] range which represent 255 values.

Alignment issues

Assuming we store merkle proofs in arrays of uint32 at a low-level, they would need to be aligned to 32-bit boundaries on many architecture that would benefit the most from light clients (ARM, MIPS, ...).

Furthermore Merkle proofs would probably be excellent candidates to be accelerated by SIMD/vector hardware which requires also requires alignment.

This off-by-one will cause unaligned loads/store that implementers will have to work around, probably via padding further increasing memory waste or via temporaries.

Potential counter-argument: extra computation

Some might be worried that 0-based is slower than 1-based indexing where we can just use 2*i and 2*i+1 to find children instead of 2*i+1 and 2*i+2 and i/2 to find parents instead of (i-1)/2.

It doesn't make a difference because at a low-level, CPUs support Scale*Index+Offset addressing mode if scale is a power of 2. Execution time is the same. (This is called SIB addressing in x86 and scaled register in ARM).
The performance is dominated by fetching data from memory. During the previous memory fetch, the CPU can precompute the next location to fetch. as there is no data dependencies.

vbuterin commented 5 years ago

The memory saving is really negligible, and I think it actually improves alignment; " If we have N nodes in the merkle tree where N is a power of 2 we will waste N-1 node space." is incorrect because most of the Merkle trees we use have 2^N leaves, and so 2^(N+1) - 1 total nodes, so adding an empty value in the zero position just gets us to 2^(N+1)

Also, and much more importantly, generalized indices are not meant to correspond to memory positions and are you are not meant to think of it as an array; for most complex SSZ objects, that would lead to a lot of empty space in your serialization because of regions that are unused.

The purpose of a generalized index is to serialize a binary path, and you can think of the algorithm as follows: the highest 1-bit is a flag saying "the bits before this were just zeroes but the bits after this are where the actual path starts", followed by the bits of the path from top to bottom. Viewed this way, offsetting everything down by 1 would break this abstraction.

mratsim commented 5 years ago

Also, and much more importantly, generalized indices are not meant to correspond to memory positions and are you are not meant to think of it as an array; for most complex SSZ objects, that would lead to a lot of empty space in your serialization because of regions that are unused.

The issue with that is that we will need a translation layer between generalized indices and the actual implementation which is 0-based raising the risk of off-by-one errors.

Assuming we implement as sparse Merkle trees, the space difference disappears for both 0-based and 1-based but the risk of errors stays.

Regarding the bit abstraction part, this is indeed the same coding as used in Fenwick Trees/Binary Indexed Trees

A Fenwick tree is most easily understood by considering a one-based array. Each element whose index i is a power of 2 contains the sum of the first i elements. Elements whose indices are the sum of two (distinct) powers of 2 contain the sum of the elements since the preceding power of 2. In general, each element contains the sum of the values since its parent in the tree, and that parent is found by clearing the least-significant bit in the index.

To find the sum up to any given index, consider the binary expansion of the index, and add elements which correspond to each 1 bit in the binary form.

For example, say one wishes to find the sum of the first eleven values. Eleven is 10112 in binary. This contains three 1 bits, so three elements must be added: 10002, 10102, and 10112. These contain the sums of values 1–8, 9–10, and 11, respectively.

To modify the eleventh value, the elements which must be modified are 10112, 11002, 100002, and all higher powers of 2 up to the size of the array. These contain the sums of values 11, 9–12, and 1–16, respectively. The maximum number of elements which may need to be updated is limited by the number of bits in the size of the array.

However in practice going through the branches would just use recursive "if" test, the binary coding would only be used for visual inspection and we can just not the binary representation to switch around the bits.

Comparatively, off-by-one errors are much harder to notice, debug and track down and I'd rather pay the small cost of breaking this abstraction to avoid off-by-ones.

vbuterin commented 5 years ago

Generalized indices are intended for all kinds of SSZ objects, the great majority of which will be highly unbalanced (eg. think of the BeaconState), meaning that the mapping of tree leaves to generalized indices is going to have a lot of "holes" in it. So using the generalized index as a memory location is going to be a terrible idea, as you would have a really inefficient memory setup (for the ShardState it's looking like generalized indices might exceed 2**300)

So for example, in the data structure below (eg. the SSZ structure {x: bytes32, y: {z: bytes32, w: bytes32}}):

    R
  A   B
     C D

The indices would be: R=1, A=2, B=3, C=6, D=7. Nothing would correspond to 4 or 5.

The reason I don't use the sequential addressing as in the image of the chart in that wiki article is that it's a desideratum for a description of a given Merkle path to not depend on anything outside that path. You especially don't want a description of the path to depend on the size of a variable length list somewhere else in the data structure.

JustinDrake commented 5 years ago

This was discussed on today's call (cc @protolambda and @djrtwo)—sticking with 1-based indexing for generalised indices lead to cleaner implementations.

ethereum / consensus-specs