Closed jxieeducation closed 10 years ago
Hey @jxieeducation ,
first of all thanks for your contribution, this is really great! Here are my thoughts:
1) I had a look at the BTrees documentation and it seems to rely heavily on ZODB and its persistence
module, so I'm not entirely sure if it's the right thing to use here. Basically, I think we want a BTree that can be incrementally written and read from disk (to avoid loading it into memory when the database starts up)
2) BTrees allow lookup (as well as insertion and deletion) operations in O(log n) time (http://en.wikipedia.org/wiki/B-tree), as opposed to O(1) for lookup in hash tables. The main advantage of a BTree is that it can be written and read incrementally, thus without loading its entire content into memory first, and that range queries can be performed efficiently.
3) Like with the normal file-based index, we need to insert key/value pairs into the BTree to be able to do rich comparisons between values, search operations and range queries. I think the update
function of OOBTree
allows for storing key/value pairs in the tree.
Summarizing, I think it would be interesting to see if there's a more lightweight BTree library available for Python or if we can even implement a simple BTree structure ourselves. Feel free however to further investigate the OOBTrees
package, if you do I would be interested to see how exactly we would insert key/value pairs and persist the tree on disk. So it would be great if you could write a simple demo that defines a fully functioning BTree-based index class and contains some unit tests that we can run to check its correctness.
Good work!
Thanks, I totally agree that this is not the library that we are looking for. Will be coming up with a simple BTree then.
This is an extremely simple proof of concept for https://github.com/adewes/blitzdb/issues/9
This allows the backend to adopt a BTree model using the BTrees pip library (sudo pip install BTrees, did not add to the requirements.txt yet). While attempting to add the BTree, I found a few problems with using a BTree.
While writing the simple add / remove functions, I assumed that caching is not needed for BTrees, because it offers little adv. as the tree goes down. I also believe a BTree will also be clunky when elements get large, because it runs in O(n), while the hashmap runs in O(1).
Please let me know what you think?