Closed tmm1 closed 7 years ago
@mschoch Instead of looping over the entire database to find and delete 0-count dictionary terms, would something like this work (to delete entries when 0-count writes occur):
diff --git a/leveldb/batch.go b/leveldb/batch.go
index b9b2bd4..7a80fb1 100644
--- a/leveldb/batch.go
+++ b/leveldb/batch.go
@@ -10,6 +10,8 @@
package leveldb
import (
+ "bytes"
+
"github.com/blevesearch/bleve/index/store"
"github.com/jmhodges/levigo"
)
@@ -21,7 +23,11 @@ type Batch struct {
}
func (b *Batch) Set(key, val []byte) {
- b.batch.Put(key, val)
+ if bytes.HasPrefix(key, []byte("d")) && bytes.Equal(val, []byte{0}) {
+ b.batch.Delete(key)
+ } else {
+ b.batch.Put(key, val)
+ }
}
func (b *Batch) Delete(key []byte) {
Pretty gross hack, so probably makes sense a level-up in the writer:
diff --git a/leveldb/writer.go b/leveldb/writer.go
index 70467a5..c38f77d 100644
--- a/leveldb/writer.go
+++ b/leveldb/writer.go
@@ -10,6 +10,7 @@
package leveldb
import (
+ "bytes"
"fmt"
"github.com/blevesearch/bleve/index/store"
@@ -62,7 +63,12 @@ func (w *Writer) ExecuteBatch(b store.KVBatch) error {
if !fullMergeOk {
return fmt.Errorf("unable to merge")
}
- batch.Set(k, mergedVal)
+
+ if bytes.Equal(mergedVal, []byte{0}) {
+ batch.Delete(k)
+ } else {
+ batch.Set(k, mergedVal)
+ }
}
err := w.store.db.Write(w.options, batch.batch)
Obviously wouldn't work for rocksdb, but could be adapted to the goleveldb/boltdb stores to replace the current Compact() hacks. /cc @mmindenhall
I'm fine with adding Compact(), but I don't think it's appropriate for the Store impls to have hacks to work around this. The proposed solution would only work because we "know" we don't wan to keep merge results with value "0". But, the truth is the store doesn't know that, it's assuming that based on the current usage pattern of the index tier. The only cleaner thing I can think of is have some sort of callback to the index tier which can correctly identify merge results that should be deleted.
The proposed solution would only work because we "know" we don't wan to keep merge results with value "0". But, the truth is the store doesn't know that, it's assuming that based on the current usage pattern of the index tier.
Agreed, it's definitely a huge hack.
The only cleaner thing I can think of is have some sort of callback to the index tier which can correctly identify merge results that should be deleted.
Seems like a good compromise.
Similar to methods added for goleveldb and boltdb, but without the DictionaryTerm cleanup.
cc https://github.com/blevesearch/bleve/issues/374 https://github.com/blevesearch/bleve/issues/363