citahub / cita_trie

Rust implementation of the Modified Patricia Tree (aka Trie).
Apache License 2.0
71 stars 28 forks source link

not all stale keys are deleted from db #17

Closed laizy closed 5 years ago

laizy commented 5 years ago

stale keys can be introduced by deletion and insertion, #10 only handled the deletion part.

        let mut memdb = MemoryDB::new();
        let mut trie = PatriciaTrie::new(&mut memdb, RLPNodeCodec::default());
        for i in 0..10 {
            trie.insert(format!("test{}", i).as_bytes(), b"testvalue");
            trie.commit();
        }
        for i in 0..10 {
            trie.remove(format!("test{}", i).as_bytes());
            trie.commit();
        }
        println!("clear all:trie.root={:?}", trie.root);
        println!("clear all:{:?}", trie.db);

output:

clear all:trie.root=Empty
clear all:MemoryDB { storage: RwLock { data: {[193, 152, 200, 144, 250, 10, 84, 154, 250, 172, 48, 71, 148, 203, 39, 40, 56, 205, 6, 252, 92, 235, 72, 182, 241, 177, 161, 89, 29, 125, 114, 0]: [231, 133, 23, 70, 87, 55, 67, 160, 36, 48, 168, 10, 138, 13... ]} } }
cryptowen commented 5 years ago

Should this be considered as a bug?

  1. There is no easy way to find out all the keys introduced by insertion, since in different insert order, there will be different keys. And the deletion may introduce new keys if the delete order is different with reversed insert order.
  2. The trie is designed to preserve all the state and change history in a small cost. There are no clear rules about what should be done with the keys not in the trie. And both the go version and python version just change or normalize the nodes along the path to the root to get a new trie, they do not handle the keys introduced in the process in db.

For example:

In [28]: from trie import HexaryTrie

In [29]: t = HexaryTrie(db={})

In [30]: t.set(b'a', b'1')

In [31]: t.db
Out[31]: {b'Y\xb6\x97\xa4\xdb\x0cGU@Hb\x15\xc1\xe8\xf6IE\xa4\xa0\xad\x8d)\x1d\xe6\xcd\x90!\xc4\xfb\xf69\xdd': b'\xc4\x82 a1'}

In [32]: t.set(b'b', b'1')

In [33]: t.db
Out[33]:
{b'V\xb2R\xe5\xed\xd4\xe1\xb8\xdc\xc2\xe8\x94{z\n\x0c\xd3C\xa3\xcc\xec9\xcc\r\xa0\xe5"]x\x83\x8f\x07': b'\xd7\x16\xd5\x80\xc2 1\xc2 1\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80',
 b'Y\xb6\x97\xa4\xdb\x0cGU@Hb\x15\xc1\xe8\xf6IE\xa4\xa0\xad\x8d)\x1d\xe6\xcd\x90!\xc4\xfb\xf69\xdd': b'\xc4\x82 a1'}

In [34]: t.delete(b'b')

In [35]: t.db
Out[35]:
{b'V\xb2R\xe5\xed\xd4\xe1\xb8\xdc\xc2\xe8\x94{z\n\x0c\xd3C\xa3\xcc\xec9\xcc\r\xa0\xe5"]x\x83\x8f\x07': b'\xd7\x16\xd5\x80\xc2 1\xc2 1\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80',
 b'Y\xb6\x97\xa4\xdb\x0cGU@Hb\x15\xc1\xe8\xf6IE\xa4\xa0\xad\x8d)\x1d\xe6\xcd\x90!\xc4\xfb\xf69\xdd': b'\xc4\x82 a1'}

In [36]: t.delete(b'a')

In [37]: t.db
Out[37]:
{b'V\xb2R\xe5\xed\xd4\xe1\xb8\xdc\xc2\xe8\x94{z\n\x0c\xd3C\xa3\xcc\xec9\xcc\r\xa0\xe5"]x\x83\x8f\x07': b'\xd7\x16\xd5\x80\xc2 1\xc2 1\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80',
 b'Y\xb6\x97\xa4\xdb\x0cGU@Hb\x15\xc1\xe8\xf6IE\xa4\xa0\xad\x8d)\x1d\xe6\xcd\x90!\xc4\xfb\xf69\xdd': b'\xc4\x82 a1'}

In [38]: t.root_hash
Out[38]: b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'

In [39]: HexaryTrie(db={}).root_hash
Out[39]: b'V\xe8\x1f\x17\x1b\xccU\xa6\xff\x83E\xe6\x92\xc0\xf8n[H\xe0\x1b\x99l\xad\xc0\x01b/\xb5\xe3c\xb4!'
yejiayu commented 5 years ago

So, either do not join the logic deleted from the db or delete it correctly.

cryptowen commented 5 years ago

I prefer no deletion from db in any time, preserving all the history.

yejiayu commented 5 years ago

We can correctly collect the key that was deleted in a commit cycle, but whether or not it is actually removed from the DB can be decided by the party that implements the Database trait.

laizy commented 5 years ago

There is no easy way to find out all the keys introduced by insertion, since in different insert order, there will be different keys. And the deletion may introduce new keys if the delete order is different with reversed insert order.

It seems possible. In insert_at function, we can record all the hash node loaded from db. if insertion succeed, those HashNodes will be updated, so we can deleted it safely in trie.commit.