Open boz opened 2 years ago
@boz - apologies for the delay here. This one slipped through.
You might be in luck here. We recently did some work to firm up the semantics of mutations w.r.t. batch iteration (see #1640). Now, if a mutation is made to a batch after the iterator on the batch was created, that mutation should not be visible to the iterator. One can "refresh" the state of the iterator by calling (*Batch).SetOptions
.
I believe the following test demonstrates what you're asking for:
package pebble
import (
"testing"
"github.com/cockroachdb/pebble/vfs"
"github.com/stretchr/testify/require"
)
func Test(t *testing.T) {
opts := &Options{FS: vfs.NewMem()}
db, err := Open("", opts)
require.NoError(t, err)
defer db.Close()
// Insert into the DB.
err = db.Set([]byte("a"), nil, nil)
require.NoError(t, err)
err = db.Flush()
require.NoError(t, err)
// Create a new batch on top of the DB.
b := db.NewIndexedBatch()
// Take out an iterator on the batch.
iter := b.NewIter(nil)
defer iter.Close()
// The iterator should only see 'a'.
assertKeys := func(wantKeys ...string) {
var gotKeys []string
for iter.First(); iter.Valid(); iter.Next() {
gotKeys = append(gotKeys, string(iter.Key()))
}
require.Equal(t, wantKeys, gotKeys)
}
assertKeys("a")
// Add a new key to the DB. This key is not visible, as it was added after the
// iterator was opened.
err = db.Set([]byte("b"), nil, nil)
require.NoError(t, err)
err = db.Flush()
require.NoError(t, err)
assertKeys("a")
// Add a new key to the batch. This key is also not visible to the iterator
// ... yet.
err = b.Set([]byte("c"), nil, nil)
require.NoError(t, err)
assertKeys("a")
// Refreshing the iterator via SetOptions makes 'c' visible to the iterator.
// The key 'b' is not visible, as the batch has not yet been committed to the
// DB.
iter.SetOptions(&IterOptions{})
assertKeys("a", "c")
require.NoError(t, iter.Close())
// Committing the batch makes 'b' visible.
err = b.Commit(nil)
require.NoError(t, err)
iter = db.NewIter(nil)
assertKeys("a", "b", "c")
}
Note that a snapshot isn't required. Taking out the iterator is enough to "pin" the state of the DB while the iterator is open.
Hope this helps!
cc: @jbowens - let me know if I got any part of that wrong.
I had to double check my understanding of the "committed writes are still invisible to the iterator, despite SetOptions
making new batch mutations visible". Looks like we have an explicit test case here, so I believe that's the desired behavior.
That looks good, but considering the blockchain mempool use case, a snapshot might be required. @boz are you looking to use a snapshot because the mempool is potentially long-lived? I could imagine holding open an iterator between blocks is impossible, since it pins memtables and sstables the entire lifetime of the iterator.
I can't think of any obstacles to adding a (*pebble.Snapshot).NewIndexedBatch()
method, and I think it would be relatively straightforward.
Wanted to ping this thread as I also was exploring snapshot backed batches, so put together a quick implementation. As @jbowens noted I haven't noticed/found any obstacles.
Added some basic tests to ensure my understanding of the semantics, but not sure if there is some internal invariants of edge cases Im not aware of.
Glad to see this issue was updated recently on the "Community" board, happy to open a PR to actually solicit review/feedback. Posting the branch here for anyone that comes across it.
Branch: https://github.com/jsimnz/pebble/tree/jsimnz-snapshot-batch Diff: https://github.com/cockroachdb/pebble/compare/master...jsimnz:pebble:jsimnz-snapshot-batch
I was also surprised by both the lack of this feature and the fact that it is so easy to be added, as demonstrated by @jsimnz. What is needed for this to be moved further?
CockroachDB has no need for snapshot-backed indexed batches, so this is not a feature we will implement ourselves. Pull requests are welcome.
@jbowens: Does the work made by @jsimnz look OK? Should they made a PR with that work then?
Happy to convert that work into an official PR if its something the team would be willing the merge, assuming a review looks good.
Is there a reason why the current API doesn't allow for creating a batch that uses a snapshot for a point-in-time view of the database?
It looks simple enough to add - too simple actually, so I figure'd it'd be good to ask before making a PR. https://github.com/cockroachdb/pebble/blob/e2a8aef27e1b170d78f466fb97fda036a693abab/batch.go#L418
The use-case is for supporting a number of concurrent processing engines in a blockchain system. For instance, there is a "mempool" of uncommitted messages that should be processed using the most recent committed state, but does not commit changes itself; and there is another process which executes messages and commits state according to a consensus protocol.
The mempool processing could (very vaguely) look something like this:
Jira issue: PEBBLE-115