Provide detailed benchmarks

cristiano-belloni commented 4 months ago

As a developer choosing which storage layer to use, the immediate question is: what's the performance? How does it scale in space and time? It would be good to see a benchmark with the various adapters and the performance / RAM used / disk space used with a lot of data, to understand what kind of projects is Triplit good for.

MentalGear commented 4 months ago

I like benchmarks, though the implementation should be the same expect the underlying storage engines, hence I would imagine (naively) that just looking at general storage engine benchmarks / browser would give you the expected answer.

cristiano-belloni commented 4 months ago

Not necessarily; depending on the underlying data structures / indices used you can scale differently. How does disk / memory usage / sync speed increase with the increase of operations / records?

matlin commented 3 months ago

Under the hood, we use a fork of tuple-database that provides the ability to use different storage layers (like IndexedDB, LMDB, SQLite, etc). There are some basic storage benchmarks in that repo that provide some speed comparisons.

@cristiano-belloni do you have a specific workload in mind?

cristiano-belloni commented 3 months ago

@matlin I'm trying to understand

how the underlying db scales when you have a small number of keys, but a very large number of operations. If you have n keys and m operations that were played in time to get to the current values of the keys, is your space complexity a function of n or m (or both)? What about your time complexity on queries?
If you have a persistent sync server in the middle synchronising many different tables from many users (a must if, for example, users have different devices that are not necessarily online at the same time), does the server need to hold all the "open" tables in RAM or can it just access the db on the fly with negligible memory footprint? (As it works with prolly trees, I think the answer is "the data structures are held in memory", but I wanted to check)

matlin commented 3 months ago

If you're speaking about bounded complexity: Triplit stores a "triple" for every attribute/key so space complexity is O(n * k) where n is number of entities and k is number of attributes per entity. However, Triplit also retains the history of edits per attribute so more accurately it would be O(n * k + e) where e is the number of individual edits (again each represented by a triple).
As far as how this affects the sync server, the sync server and database server are one and thensame in Triplit. So to facilitate sync, it is not necessary to hold the entire dataset in memory because it is instead stored in a persistent storage provider like SQLite, LevelDB, LMDB, etc. So when a client wants to subscribe to a query, the database will send the required triples to represent the current state of the query (all buffered into memory currently) but then doesn't need to hold on to the query result. Instead, the server is able to compute which triples to send to the client to update its view of the query by looking at state vector, which triples have been added since that sync state, and which of those triples will affect the query the client is currently subscribed to.

cristiano-belloni commented 3 months ago

Thanks, that's what I wanted to know!

aspen-cloud / triplit

Provide detailed benchmarks #162