Address space leak in RBTree.delete

ericswanson-dfinity commented 3 years ago

It seems that stable RBTree structures could accumulate lots of wasted space over time if RBTree.delete is called:

The concern is that after we insert and delete and consider lots of churn, there is a space leak: the deleted keys are never totally removed, they are only “nulled out”.

We're going to be using RBTree.delete when syncing assets to asset canisters, which will hold long-lived data in a stable RBTree. I don't have reason to think this is an urgent problem for this use case, so I am mostly entering this for tracking.

Any app written in Motoko that uses RBTree.delete could, of course, experience the same thing.

This came out of this discussion: https://github.com/dfinity/sdk/pull/1408#discussion_r575628118

matthewhammer commented 3 years ago

These other options come to mind as well:

When you can hash the keys and their order is not important:
- Use a HashMap and just avoid the RBTree, or
- When you need an immutable dictionary, use a Trie and just avoid the RBTree.
When you need a map collection with ordered keys, perhaps use an imperative RB tree; the imperative version has a simpler (but still involved) delete implementation compared to the persistent version, which is notoriously more complex.

chenyan-dfinity commented 3 years ago

JFTR I was using RBTree instead of HashMap at that time, because we didn't have a Text.hash function. The current Text hash is quite rudimentary, we probably need to implement a more robust hash function.

nomeata commented 3 years ago

I think all the hashing needs to be revisited, stuff like

  public func hash(i : Nat) : Hash {
    let j = Prim.natToWord32(i);
    hashWord8(
      [j & (255 << 0),
       j & (255 << 8),
       j & (255 << 16),
       j & (255 << 24)
      ]);
  };

which ignores all the high bit of the Nat is a bit worrysome.

dfinity / motoko-base

Address space leak in RBTree.delete #220