Closed posgnu closed 4 years ago
@sgkim126 @majecty I'm implementing a cache to reduce DB operation. CodeChain already uses lru_cache
crate(https://contain-rs.github.io/lru-cache/lru_cache/struct.LruCache.html#method.contains_key), so I'm trying to add the cache using the same crate.
Like this:
// triedbmut.rs
pub(crate) struct TrieDB<'db> {
db: &'db dyn HashDB,
root: &'db H256,
cache: LruCache<H256, Vec<u8>>,
}
.
.
.
let node = RlpNode::Leaf(path, insert_value);
let node_rlp = RlpNode::encoded(node);
let hash = self.db.insert(&node_rlp);
self.cache.insert(hash, node_rlp);
The key-value(hash-rlp_node) is stored in the cache when they are written, read in the DB. Before DB operation(get()
), we can find the value in the cache. Therefore, the overhead derived from DB operations will be reduced by adding cache.
However, there is a problem. In fn get_aux()
in triedb.rs
, an error 'self' is a '&' reference, so the data it refers to cannot be borrowed as mutable
occur because of immutable self
. It can be solved by using RefCell
, but we cannot use the multi-thread.
Could you advise me, or Could you tell me what your opinion on this matter is?
I don't know whether TrieDB
and TriDBMut
were used in a multi-threaded environment or not. IMO, they are just a view of internal KeyValueDB, using them in a single thread is natural. So using RefCell
for the cache is a right choice.
The current TrieDB
and TrieDBMut
is Send
and Sync
. However, you can make it non-Send
and non-Sync
, since CodeChain doesn't pass it to other threads. I think RefCell
might be the solution.
@sgkim126 @majecty I implemented the cache in our Merkle trie. As I mentioned, I used lru_cache
crate. In short, when reading the value from the trie, I found that performance improved by up to 30 percent. However, when writing the value to the trie, there was no change in performance.
I tested the performance using the benchmark that we were already using, and It is like this:
const DB_SIZE: usize = 10000;
const BATCH: usize = 10000;
#[bench]
fn bench_read_multiple(b: &mut Bencher) {
let db = TestDB::new(DB_SIZE);
b.iter(|| {
let trie = db.trie();
for _ in 0..BATCH {
let item = random::<usize>() % DB_SIZE;
let _ = trie.get(&item.to_be_bytes()).unwrap().unwrap();
}
});
}
The results are as follows:
Result of benchmark tests for bench_read_multiple Control : 174,522,399 ns/iter Cache Size = 500 : 149,356,897 ns/iter => down 14.419 % Cache Size = 1000 : 135,082,059 ns/iter => down 22.599 % Cache Size = 2000 : 132,718,845 ns/iter => down 23.953 % Cache Size = 3000 : 123,239,539 ns/iter => down 29.384 % Cache Size = 4000 : 120,926,292 ns/iter => down 30.710 % Cache Size = 5000 : 117,794,169 ns/iter => down 32.504 % Cache Size = 6000 : 116,277,524 ns/iter => down 33.373 % Cache Size = 7000 : 112,805,251 ns/iter => down 35.363 %
In my opinion, we need new benchmarks to test memory usage, and tests that can be applied to the actual situation. As cache size increases, we need more memory. It causes performance degradation. Could you tell me what your opinion on the current situation?
It looks good to me.
It is interesting that the performance improved even though the test uses random keys. It seems that there are some common branch nodes that cached.
IMO, it is good to implement it. @sgkim126 What is your opinion?
@somniumism Could I see the source? Did you cache the key-value pair or the address-node pair?
@sgkim126 I pushed the commit, but I think it still has something to fix: (1) There is no change in performance when writing the value to the trie, so I have to exclude the cache from triedbmut.rs
. (2) I wanted to see the result quickly. So there may be unnecessary codes. Of course, it's my guess. So, I need your advice. Additionally, I cached the address-node pair. : )
This is the pull request: https://github.com/CodeChain-io/rust-merkle-trie/pull/7 I want you to advise me.
It is possible to insert a cache between DB operation and node handling operation.
Multiple DB operations are needed, during the insert, remove and lookup operations are executing. The overhead derived from DB operations will be reduced by adding cache.