filecoin-project / go-hamt-ipld

An implementation of a HAMT using ipld
MIT License
29 stars 25 forks source link

Represent bitwidth explicitly in root node #79

Open anorth opened 3 years ago

anorth commented 3 years ago

As of https://github.com/filecoin-project/go-amt-ipld/pull/38 the AMT stores the bitwidth in the root node. Knowledge of the bitwidth is necessary to correctly interpret the blocks, so storing it makes the structure more self-describing.

I propose that we do the same thing for the HAMT. This will require a Root node structure, which the AMT already had but needs to be introduced here.

Thoughts @Stebalien @ZenGround0?

Stebalien commented 3 years ago

We'd also need to specify the hash function (e.g., with a multicodec) and maybe a hash digest length? That shouldn't be terrible, but it'll take up a few extra bytes (likely 4?).

rvagg commented 3 years ago

Probably don't need the digest length since it's implied for most (all?) of the multihashes in the multicodec table. bitwidth and hash function multihash code are what we're storing in the IPLD hashmap spec. There's some discussion about whether this root should be a separate block but IMO it doesn't need to be since the root of a HAMT is a special thing anyway and can't be made use of as anything but the root, so you could just pack config into it.

We currently form a root a little like this (a single block with a nested structure):

type HAMTRoot struct {
  hashAlg Int
  bucketSize Int
  node HAMTNode
} representation tuple

type HAMTNode struct {
  map Bytes
  data [ Element ]
} representation tuple

# ...

so the "node" is nested within the root. An alternative form would be to nest the configuration inside "node", such that the root is the same format as every other node but it has this one additional field:

type HAMTRoot struct {
  map Bytes
  data [ Element ]
  config HAMTConfig
} representation tuple

type HAMTNode struct {
  map Bytes
  data [ Element ]
} representation tuple

type HAMTConfig struct {
  hashAlg Int
  bucketSize Int
} representation tuple

# ...

Or shaving off the additional List CBOR byte and sticking it all in the root:

type HAMTRoot struct {
  map Bytes
  data [ Element ]
  hashAlg Int
  bucketSize Int
} representation tuple

type HAMTNode struct {
  map Bytes
  data [ Element ]
} representation tuple

# ...