celestiaorg / smt

A Go library that implements a Sparse Merkle tree for a key-value map.
https://godoc.org/github.com/celestiaorg/smt
MIT License
138 stars 53 forks source link

Automatic removal of orphan nodes #36

Closed roysc closed 3 years ago

roysc commented 3 years ago

Completes work from https://github.com/celestiaorg/smt/pull/24 :

musalbas commented 3 years ago

Thanks for continuing this work and this PR!

Changing the leaf value (https://github.com/celestiaorg/smt/pull/36/commits/341c9985ac9bf4b32561c6f9ba94f71c545789af) seems to a breaking change. Is this necessary to implementing orphan removal, and why? There already shouldn't be any ambiguity about which key a leaf is for, as it's already prefixed in the leaf's value.

roysc commented 3 years ago

@musalbas sorry, my commit message wasn't very clear. The problem only arises when deleting a leaf with a duplicate value - there's no way to know if a value is still referenced by another leaf. Another solution would be some kind of reference count.

(this test case fails when using just hash(value) - https://github.com/vulcanize/smt/blob/341c9985ac9bf4b32561c6f9ba94f71c545789af/smt_test.go#L633)

adlerjohn commented 3 years ago

@roysc The implementation storing the hash of the value for the leaf value is an implementation detail. It's in no way required. You could instead store values in the separate map hash(key) -> value, as in https://github.com/celestiaorg/smt/issues/14#issuecomment-762895582 (or better yet in the separate map key -> value). Breaking the commitment specification to workaround a failing test is not the proper solution.

I suggest you start by refactoring as per #14, then pruning will follow readily. Alternatively, implement the reference counting scheme.(Actually don't bother implementing the reference counting scheme because it'll be exceedingly wasteful compared to the direct mapping of key -> value which is needed anyways for constant-time reads).

Feel free to re-open this PR or a new one if you can get pruning working without modifying the current test suite (unless there's a bug in them of course).

roysc commented 3 years ago

@adlerjohn my apologies for not respecting the commitment spec. Using key -> value is a simple change; but if this mapping is stored in the same DB, would there be any concern about malicious users intentionally storing a plain key that collides with an existing tree hash? Or are you suggesting using a separate DB instance for this?

And could you clarify what you mean by modifying the current test suite? Are you referring only to the changes in the commitment spec, or changes in the LazyLedger/Celestia core? I haven't modified any existing tests here.

adlerjohn commented 3 years ago

Or are you suggesting using a separate DB instance for this?

Yes, the suggestion is a separate KV store just for values. There's no inherent reason to put both values and nodes in the same KV store.

And could you clarify what you mean by modifying the current test suite? Are you referring only to the changes in the commitment spec

Yes, to clarify I mean that the commitment as specified here must be the same before and after any PR dealing with implementation details.

roysc commented 3 years ago

I'm not able to reopen this PR, new one: https://github.com/celestiaorg/smt/pull/37