attic-labs / noms

The versioned, forkable, syncable database
Apache License 2.0
7.45k stars 267 forks source link

Question: What is a Prolly Tree? #3849

Closed ccorcos closed 4 years ago

ccorcos commented 4 years ago

Based on the language in the documentation, it sounds like a Prolly Tree is just a prefix search tree (aka trie). Is that correct or is there some other nuance to it?

Thanks,

Chet

aboodman commented 4 years ago

Updating for Google, since this is the top search results for "prolly tree" currently. Documentation here:

https://github.com/attic-labs/noms/blob/master/doc/intro.md#prolly-trees-probabilistic-b-trees

See also the very nice overview from the Dolt people:

https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-table-data/

=== original response follows ===

Prolly trees are very different than tries. Here is some documentation:

https://docs.google.com/presentation/d/18zRxxI7plB0mJkhLPfo8KJ_f-tyIIPj9L4dZroyqEF0/edit#slide=id.g3615e39343_0_0

While tries have a history-indepedent structure, they do not have the performance characteristics of b-trees. Consider a trie that stores all English words starting with "bo". This data structure is not going to have ideal search and mutation characteristics because the fan-out is so low. Ideal data structures for use in databases have fan outs on the order of thousands, while tries will typically be far lower than that in practice, and are highly dependent upon the data you put in them.

b-trees guarantee a min/max fanout, independent of the data you put in. Prolly trees guarantee an average fanout, independent of the data you put in.

On Tue, Jul 23, 2019 at 6:09 PM Chet Corcos notifications@github.com wrote:

Based on the language in the documentation, it sounds like a Prolly Tree is just a prefix search tree (aka trie) https://en.wikipedia.org/wiki/Trie. Is that correct or is there some other nuance to it?

Thanks,

Chet

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/attic-labs/noms/issues/3849?email_source=notifications&email_token=AAATUBHIAYGN7EOH3B7JOFDQA7IXDA5CNFSM4IGL7BJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBDGPWQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAATUBERKPSIQF7IP7NGVJDQA7IXDANCNFSM4IGL7BJQ .

ccorcos commented 4 years ago

Thanks for the link!

I was thinking I would hash the keys before using the trie to balance it out -- apparently that's called a Patricia Tree. It wouldn't be useful as an index, just for syncing...

I'm not sure why you would want to sync an index though. I figured you would just sync transactions and build indexes in either place.

aboodman commented 4 years ago

(editing to flesh this out a bit):

If you do that you can’t support a sorted scan over the key space without a separate index which is a pretty big bummer (indexes not cheap to maintain).

For example: imagine you're modeling a key/value store. Well, good kv stores support not only get/set operations but sorted scans. In order to do that using the hashed trie approach you'll have to maintain a separate index which will cost space, but more importantly time because every read and write is going to turn into double the number of disk operations. So this is an example of a case where at least one index is really required, not optional. An appropriate data structure, like a b-tree or lsm tree, or prolly tree gives you that index for free.

The same issue exists if you're implementing a relational model (over the pk space), or in lots of other applications.

This isn't to say the trie approach won't work - it obviously does and is used by successful software. It's just that prolly trees offer a better set of tradeoffs in our view for what Noms is trying to do.

aboodman commented 4 years ago

Also check out CHAMP which is considered the state of the art along the direction you were asking about.

ccorcos commented 4 years ago

I see, that is a nice behavior. Its basically a O(2n) -> O(n) optimization.

Now I'm curious, if you have 5 indexes, one of those indexes is the primary index -- are you going to sync just the primary index and rebuild the other 4 indexes or are you going to sync all 5 indexes?

aboodman commented 4 years ago

It depends. It's 2n -> n for a single read or write, but it can be much more dramatic in the case that the operations are happening in sorted order. If you're scanning an index in sorted order in prolly trees (or b-trees, or lsm-trees) the number of IO operations is generally going to be n/k, where k is the branching factor. But for a hashed trie plus a supplementary index, it will be n+n/k (assuming the index has n/k behavior).

The same applies if you're inserting in sorted order. Which is pretty common. Imagine cases like time-series data, appending to queues, etc.

Some of these perf charactertistics are compared on the last slide of the presentation I linked.

And unlike in in-memory algorithms, for databases, the constant factors really matter because IO operations are so expensive. IO operations can be dozens of ms depending on the details, so databases go to great lengths to minimize them. Typically a write to even a PB table will just involve a handful of IO operations. If you're talking about going from a couple iops to one per item you're seeking, it's basically a non-starter for many applications.

To answer your question: typically you'd probably not sync secondary indices, but it depends on the application. In any case the fact that secondary indices are built out of the same datastructure as primary data turns out to be quite useful.

ccorcos commented 4 years ago

I see. Thanks for the explanation!