celestiaorg / celestia-app

Celestia consensus node
https://celestiaorg.github.io/celestia-app/
Apache License 2.0
345 stars 290 forks source link

Proposal: Create a light data availability header #826

Open evan-forbes opened 2 years ago

evan-forbes commented 2 years ago

Light clients only use rows for sampling, and as decided in #577, light clients already know row roots over rows that consist only of tail padding shares.

Therefore, it might be possible to create a light DAH, where we only include row roots for rows that have data, along with the merkle root of the column roots (in order to calculate the data root).

type LightDataAvailabilityHeader struct {
    RowRoots [][]byte
    // the root of the the column roots
    ColumnRootRoot []byte
    SquareSize     uint64
    // volitile
    hash []byte
}

// Hash computes and caches the merkle root of the row and column roots.
func (ldah *LightDataAvailabilityHeader) Hash() []byte {
    if ldah == nil {
        return merkle.HashFromByteSlices(nil)
    }
    if len(ldah.hash) != 0 {
        return ldah.hash
    }

    slices := make([][]byte, ldah.SquareSize)
    copy(slices[:len(ldah.RowRoots)], ldah.RowRoots)
    // The single data root is computed using a simple binary merkle tree.
    // Effectively being root(rowRoots || columnRoots):
    slices = append(slices, ldah.ColumnRootRoot)
    ldah.hash = merkle.HashFromByteSlices(slices)
    return ldah.hash
}

func (ldah *LightDataAvailabilityHeader) Fill() {
    if uint64(len(ldah.RowRoots)) == ldah.SquareSize {
        return
    }
    rrs := make([][]byte, ldah.SquareSize)
    copy(rrs[:len(ldah.RowRoots)], ldah.RowRoots)
    for i, rowRoot := range rrs {
        if len(rowRoot) == 0 {
            rrs[i] = appconsts.TailPaddingRoot
        }
    }
    ldah.RowRoots = rrs
}
Wondertan commented 2 years ago

Light clients only use rows for sampling

Hmm. Current implementation of sampling samples uniformly from rows and cols. The full node also tries block reconstruction from row and col. Do we miss something on the node side, or when it was decided to sample from rows only?

evan-forbes commented 2 years ago

Current implementation of sampling samples uniformly from rows and cols.

oh shoot, no then that was just a misunderstanding from my part then!

evan-forbes commented 2 years ago

perhaps we could provided this as an option in the distant future when the DAH becomes quite large. Although that might not be safe to have some light clients sample from cols and some not.

evan-forbes commented 2 years ago

The filling portion where we remove row roots with only tail padding shares could still be used I think, but am unsure if that optimization is worth the complexity