cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.77k stars 443 forks source link

snapshot-backed batches #1416

Open boz opened 2 years ago

boz commented 2 years ago

Is there a reason why the current API doesn't allow for creating a batch that uses a snapshot for a point-in-time view of the database?

It looks simple enough to add - too simple actually, so I figure'd it'd be good to ask before making a PR. https://github.com/cockroachdb/pebble/blob/e2a8aef27e1b170d78f466fb97fda036a693abab/batch.go#L418

The use-case is for supporting a number of concurrent processing engines in a blockchain system. For instance, there is a "mempool" of uncommitted messages that should be processed using the most recent committed state, but does not commit changes itself; and there is another process which executes messages and commits state according to a consensus protocol.

The mempool processing could (very vaguely) look something like this:

  1. create snapshot
  2. create snapshot-backed batch
  3. for each new message: 3.1. push savepoint 3.2. process message 3.3. pop savepoint if transaction succeeds, rollback if transaction fails 3.4. if new state has been committed via consensus, release snapshot and start at 1.

Jira issue: PEBBLE-115

nicktrav commented 2 years ago

@boz - apologies for the delay here. This one slipped through.

You might be in luck here. We recently did some work to firm up the semantics of mutations w.r.t. batch iteration (see #1640). Now, if a mutation is made to a batch after the iterator on the batch was created, that mutation should not be visible to the iterator. One can "refresh" the state of the iterator by calling (*Batch).SetOptions.

I believe the following test demonstrates what you're asking for:

package pebble

import (
    "testing"

    "github.com/cockroachdb/pebble/vfs"
    "github.com/stretchr/testify/require"
)

func Test(t *testing.T) {
    opts := &Options{FS: vfs.NewMem()}
    db, err := Open("", opts)
    require.NoError(t, err)
    defer db.Close()

    // Insert into the DB.
    err = db.Set([]byte("a"), nil, nil)
    require.NoError(t, err)
    err = db.Flush()
    require.NoError(t, err)

    // Create a new batch on top of the DB.
    b := db.NewIndexedBatch()

    // Take out an iterator on the batch.
    iter := b.NewIter(nil)
    defer iter.Close()

    // The iterator should only see 'a'.
    assertKeys := func(wantKeys ...string) {
        var gotKeys []string
        for iter.First(); iter.Valid(); iter.Next() {
            gotKeys = append(gotKeys, string(iter.Key()))
        }
        require.Equal(t, wantKeys, gotKeys)
    }
    assertKeys("a")

    // Add a new key to the DB. This key is not visible, as it was added after the
    // iterator was opened.
    err = db.Set([]byte("b"), nil, nil)
    require.NoError(t, err)
    err = db.Flush()
    require.NoError(t, err)
    assertKeys("a")

    // Add a new key to the batch. This key is also not visible to the iterator
    // ... yet.
    err = b.Set([]byte("c"), nil, nil)
    require.NoError(t, err)
    assertKeys("a")

    // Refreshing the iterator via SetOptions makes 'c' visible to the iterator.
    // The key 'b' is not visible, as the batch has not yet been committed to the
    // DB.
    iter.SetOptions(&IterOptions{})
    assertKeys("a", "c")
    require.NoError(t, iter.Close())

    // Committing the batch makes 'b' visible.
    err = b.Commit(nil)
    require.NoError(t, err)
    iter = db.NewIter(nil)
    assertKeys("a", "b", "c")
}

Note that a snapshot isn't required. Taking out the iterator is enough to "pin" the state of the DB while the iterator is open.

Hope this helps!

nicktrav commented 2 years ago

cc: @jbowens - let me know if I got any part of that wrong.

I had to double check my understanding of the "committed writes are still invisible to the iterator, despite SetOptions making new batch mutations visible". Looks like we have an explicit test case here, so I believe that's the desired behavior.

jbowens commented 2 years ago

That looks good, but considering the blockchain mempool use case, a snapshot might be required. @boz are you looking to use a snapshot because the mempool is potentially long-lived? I could imagine holding open an iterator between blocks is impossible, since it pins memtables and sstables the entire lifetime of the iterator.

I can't think of any obstacles to adding a (*pebble.Snapshot).NewIndexedBatch() method, and I think it would be relatively straightforward.

jsimnz commented 1 year ago

Wanted to ping this thread as I also was exploring snapshot backed batches, so put together a quick implementation. As @jbowens noted I haven't noticed/found any obstacles.

Added some basic tests to ensure my understanding of the semantics, but not sure if there is some internal invariants of edge cases Im not aware of.

Glad to see this issue was updated recently on the "Community" board, happy to open a PR to actually solicit review/feedback. Posting the branch here for anyone that comes across it.

Branch: https://github.com/jsimnz/pebble/tree/jsimnz-snapshot-batch Diff: https://github.com/cockroachdb/pebble/compare/master...jsimnz:pebble:jsimnz-snapshot-batch

mitar commented 8 months ago

I was also surprised by both the lack of this feature and the fact that it is so easy to be added, as demonstrated by @jsimnz. What is needed for this to be moved further?

jbowens commented 8 months ago

CockroachDB has no need for snapshot-backed indexed batches, so this is not a feature we will implement ourselves. Pull requests are welcome.

mitar commented 8 months ago

@jbowens: Does the work made by @jsimnz look OK? Should they made a PR with that work then?

jsimnz commented 8 months ago

Happy to convert that work into an official PR if its something the team would be willing the merge, assuming a review looks good.