brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 64 forks source link

commit-order scan guarantees #4351

Open mccanne opened 1 year ago

mccanne commented 1 year ago

It might be desirable to provide ordering guarantees in commit order when the pool key value is unifrom across a large amount of data. We previously supported this at small scale but our design breaks at large scale and we should contemplate how we might implement this feature.

philrz commented 1 year ago

Here's some notes-to-self on this topic, as @mccanne and @nwt explained it to me in more detail yesterday. A specific example would be if there were log records in a pool with a timestamp key, as would be typical for logs. If a bunch of those records shared the same timestamp and they were spread out across multiple data objects in the pool, then they might be returned in a query response in a different order than when they were ingested. If a user is studying a sequence of events based on these log messages, the different order could indeed be quite misleading and undesirable. Since the default object size is 500 MB and such duplicate keys should be rare, this exposure is thankfully minimal. But non-zero.