DanielStutzbach / blist

A list-like type with better asymptotic performance and similar performance on small lists
Other
310 stars 36 forks source link

Limited iteration #39

Closed lericson closed 12 years ago

lericson commented 12 years ago

Something B+ trees are used for a lot is to answer the question "which items satisfy p >= start and p < stop" - this I reckon should be possible to answer with the blist implementation but is not possible via the APIs exposed it seems?

The usecase is using blist to make a database indexing engine.

lericson commented 12 years ago

The obvious API would be o[start:stop]

DanielStutzbach commented 12 years ago

blist supports all of the operations a regular list supports, including o[start:stop].

lericson commented 12 years ago

start and stop in that case aren't keys, they're indexes. You can't go L['foo':'foobar'] for example.

DanielStutzbach commented 12 years ago

blist is heavily inspired by the B+ tree, but it's not a B+ tree. There are no keys, and there's no guarantee that the items in the blist are sorted.

You could use the sortedlist class that comes with blist to accomplish what you want, though, by making use of the .bisect_left and/or .bisect_right methods. Based on your use case, you might want to consider using a proper B+ tree implementation; it might be a better fit.

lericson commented 12 years ago

Thank you for your help! blist.sortedlist does indeed seem do the trick.

It may be a good idea to find a B+ tree implementation first, either in C or Python. Would you happen to know one?

lericson commented 12 years ago

One more question, how is performance of the following:

def query(L, v, limit=None):
  i = L.bisect_right(v)
  i_u = None if limit is None else i + limit
  return L[i:i_u]

ISTM it will be O(log n)?

DanielStutzbach commented 12 years ago

Glad I could help. The performance would be O(log**2 n) in the worst case: bisect_right will do O(log n) lookups and each lookup can take O(log n) in the worst case. If you don't insert or delete things from the list (or do so extremely rarely), then the lookups will take only O(1) and the overall performance will be O(log n).

I don't know of a good B+ tree implementation offhand.