blevesearch / blevex

Bleve Extensions
47 stars 23 forks source link

Add Compact() to leveldb and rocksdb stores #30

Closed tmm1 closed 7 years ago

tmm1 commented 7 years ago

Similar to methods added for goleveldb and boltdb, but without the DictionaryTerm cleanup.

cc https://github.com/blevesearch/bleve/issues/374 https://github.com/blevesearch/bleve/issues/363

tmm1 commented 7 years ago

@mschoch Instead of looping over the entire database to find and delete 0-count dictionary terms, would something like this work (to delete entries when 0-count writes occur):

diff --git a/leveldb/batch.go b/leveldb/batch.go
index b9b2bd4..7a80fb1 100644
--- a/leveldb/batch.go
+++ b/leveldb/batch.go
@@ -10,6 +10,8 @@
 package leveldb

 import (
+   "bytes"
+
    "github.com/blevesearch/bleve/index/store"
    "github.com/jmhodges/levigo"
 )
@@ -21,7 +23,11 @@ type Batch struct {
 }

 func (b *Batch) Set(key, val []byte) {
-   b.batch.Put(key, val)
+   if bytes.HasPrefix(key, []byte("d")) && bytes.Equal(val, []byte{0}) {
+       b.batch.Delete(key)
+   } else {
+       b.batch.Put(key, val)
+   }
 }

 func (b *Batch) Delete(key []byte) {
tmm1 commented 7 years ago

Pretty gross hack, so probably makes sense a level-up in the writer:

diff --git a/leveldb/writer.go b/leveldb/writer.go
index 70467a5..c38f77d 100644
--- a/leveldb/writer.go
+++ b/leveldb/writer.go
@@ -10,6 +10,7 @@
 package leveldb

 import (
+   "bytes"
    "fmt"

    "github.com/blevesearch/bleve/index/store"
@@ -62,7 +63,12 @@ func (w *Writer) ExecuteBatch(b store.KVBatch) error {
        if !fullMergeOk {
            return fmt.Errorf("unable to merge")
        }
-       batch.Set(k, mergedVal)
+
+       if bytes.Equal(mergedVal, []byte{0}) {
+           batch.Delete(k)
+       } else {
+           batch.Set(k, mergedVal)
+       }
    }

    err := w.store.db.Write(w.options, batch.batch)

Obviously wouldn't work for rocksdb, but could be adapted to the goleveldb/boltdb stores to replace the current Compact() hacks. /cc @mmindenhall

mschoch commented 7 years ago

I'm fine with adding Compact(), but I don't think it's appropriate for the Store impls to have hacks to work around this. The proposed solution would only work because we "know" we don't wan to keep merge results with value "0". But, the truth is the store doesn't know that, it's assuming that based on the current usage pattern of the index tier. The only cleaner thing I can think of is have some sort of callback to the index tier which can correctly identify merge results that should be deleted.

tmm1 commented 7 years ago

The proposed solution would only work because we "know" we don't wan to keep merge results with value "0". But, the truth is the store doesn't know that, it's assuming that based on the current usage pattern of the index tier.

Agreed, it's definitely a huge hack.

The only cleaner thing I can think of is have some sort of callback to the index tier which can correctly identify merge results that should be deleted.

Seems like a good compromise.