blevesearch / bleve

A modern text/numeric/geo-spatial/vector indexing library for go
Apache License 2.0
9.98k stars 677 forks source link

add badger db support #591

Open joeblew99 opened 7 years ago

joeblew99 commented 7 years ago

https://github.com/dgraph-io/badger

I am proposing that this DB be supported by bleve. Badger on this outperforms rocksdb, which is outstanding.

here is a blog post with the benchmarks https://open.dgraph.io/post/badger/

My understanding is that they will be adding snapshotting also.

mschoch commented 7 years ago

If someone from the community wants to add support it that would be great.

Internally we're focusing on support for moss in the bleve 1.x line. And in bleve 2.x we'll most likely be moving to a custom format and not using generic k/v store in the default index implementation.

joeblew99 commented 7 years ago

ok.. well good to know the roadmap. i dont have time to do it. Will leave open ?

mschoch commented 7 years ago

Sure, let's leave it open.

akhenakh commented 7 years ago

I'm planning on giving it a shot.

Anyone else working on it?

bt commented 7 years ago

@akhenakh That would be awesome, please! :)

akhenakh commented 7 years ago

WIP https://github.com/akhenakh/bleve/tree/badger

bt commented 7 years ago

Hey @akhenakh, great work so far!

I'm having a look at it and it's looking great. Was hoping I could help out.

I'm just taking a look at Badger and given that they don't have any plans to support snapshots (dgraph-io/badger#39), what's the plan here?

I believe they talked a bit about dumping the database concurrently, but the database won't be frozen in time.

akhenakh commented 7 years ago

Yeah the snapshot is not going to happen soon. I'm using the KV interface outside of Bleve, so it won't be an issue for me when used in specific places.

But not sure how it would impact Bleve, any clue @bt ?

mschoch commented 7 years ago

Pretty much all of the search side of Bleve requires a snapshot to work correctly. The reason is that multiple iterators are created, and they have to be guaranteed to the see the same underlying data. If they don't, you'll get inconsistent search results.

A simple example would be a phrase search for "fat cat". If version 1 of a document has "fat man" and the new version of a document has "old cat", without snapshots its possible that a search for the phrase "fat cat" is successful, even though neither the old nor the new version of the document ever contained that phrase.

You might think that atomic batches prevent this, but without snapshots, its possible for the first iterator (looking for "fat") sees the old version of the document, and second iterator (looking for "cat") sees the new version of the document.

akhenakh commented 7 years ago

thanks for the clarification @mschoch Let's talk to the dgraph guys to see what they have in mind.

bt commented 7 years ago

Hey @mschoch, just as a clarification just to make sure I'm getting this correct:

If I was to execute a search, and whilst the search is running, the store is updated with a new document AND that document matches the search phrase, the search shouldn't see the new document, because the search would only be searching the snapshot that is made at the time the search was first executed. Am I correct?

In this case, even with the workaround that was suggested in the linked issue above would not work, as if the iterator moves as new data is indexed, it would pick up the new data as the search is run.

If that's true, then I guess the only way would be to implement snapshotting into Badger, if that's part of their plans.

mschoch commented 7 years ago
  1. I don't claim to fully understand their proposed workaround. Certainly, streaming out a separate copy of the entire index, for every single search seems prohibitive. Could you get creative, and do this on some interval, and maybe that interval is tolerable delay for changes to show up in search results? Maybe thats OK for some applications, but probably not all.

  2. The issue is worse than just a single iterator seeing some new records. The problem is that search in bleve uses multiple iterators, and without snapshots, even if we try to create them all at the same time, they're really starting at slightly different points in time. This means for those iterators might see different key/value pairs associated with a single logical document. In the example I gave above, it can lead to scenarios where the search matches a phrase which didn't exist in either version of the document, it just appeared to match because 2 iterators were seeing into different points in time. Thats worse than just seeing the old or the new version.

manishrjain commented 7 years ago

(Author of Badger here)

While we don't have any plans to support snapshots natively in Badger, this can be very simple implementation above it.

You can create a simple layer above Badger, which appends current timestamp (or index) to keys as a suffix. Then if you need a snapshot at a given time, you can append that timestamp in the search key, and do an iteration from there. The first key you encounter would be the value just at or before the snapshot. This layer can have a very simple API; and would also provide others with a snapshot functionality above Badger, which would be nice.

bt commented 7 years ago

Hey @manishrjain, this sounds like a great idea!

So this implementation should be outside of Badger and act as some sort of intermediary "bridge" to support snapshotting between Badger and Bleve?

manishrjain commented 7 years ago

Yeah, this should be outside of Badger. Can be a simple layer, which just adds the timestamp or index to the keys being written, and then on a lookup, can do a single key iteration. If you need some help, or some APIs in Badger, we'd be happy to provide -- we just don't want this versioning logic to live within Badger to keep things simple and performant.

For e.g., for iteration, we can probably provide an API which just gives you the first key on or after the search key -- this would avoid you having to create an iterator (and pay for the overhead).

bt commented 7 years ago

Hey @manishrjain, so just before I start hacking away at a solution for this, I just wanted to confirm my algorithm (sorry, I'm not too familiar with the underlying technology behind KV-stores).

Your proposed solution; as I understand, what you are saying is that I could build a layer on top of Badger, which would append both the key=value as well as <time>key=value entries into the KV store. Then, at the time of retrieval, I can search for <search_time>key which would return to me, the value of the last write for that key.

So, given the following scenario (where t is an example of time/Unix timestamp):

At t=1, key=value is set into the KV store. Therefore there will be two entries in the store:

key=value
1key=value

At t=5, key=value2 is set into the KV store. At this point, there will be three entries in the store:

key=value2
1key=value
5key=value2

If a snapshot is taken at t=9, therefore, we can retrieve values at the closet to time less than t=9, in this case, would be t=5, with the value of value2.

This is how I'm understanding the possible implementation of snapshotting into Badger. Please let me know if this is what you're thinking or if I'm just completely out of my mind :smile:

manishrjain commented 7 years ago

Close, but more complex than needed.

You just want to append a padded timestamp to the key as suffix (not prefix). So,

Then, when you need a snapshot for t=4, you can do a reverse iteration to get the first key which is lower than key<t4>. This should return key<t1>. If you want the latest key, then start doing reverse from key~, and you'll get key<t5>; and so on.

Note that binary.BigEndian gives you encoding in a way that the []byte repr would be sorted in the same order as the ints themselves. That's very useful here. So, you can convert timestamp to a positive integer, and use big endian encoding to create the suffix.

In fact, we can add an API called GetBefore(key), which can retrieve the key-value just before the specified key.

This works pretty well. The only annoying part is the need for doing reverse iteration. Forward iteration works nicer, easier to understand. But, would require the timestamp to be encoded in a way, where the latest timestamp generates the smallest byte slice. Not sure if there's already a way to do it nicely. Irrespective, the above would give you snapshots.

bt commented 7 years ago

Good to see I'm on the right track!

So if I was to implement the above, wouldn't there be some performance degrade given that we'll be storing the value multiple times? How would you propose to purge snapshots out?

manishrjain commented 7 years ago

Storing the value multiple times won't really degrade performance per se. But, just cause your value log and LSM tree to be bigger in size.

I think what you could do is to iterate over the LSM tree periodically and delete the keys below a certain time threshold. Badger has a key-only iteration that you can use for this purpose. If the LSM tree is in memory, as we recommend, it is very cheap to iterate over it (In the benchmarks, it's blazing fast!).

akhenakh commented 7 years ago

To get a forward timestamp, I'm using a reverse timestamp:

math.MaxInt64 - t.UnixNano()

Would this work in this case ?

manishrjain commented 7 years ago

Yeah. That should work. Put the result in uint64, and do big endian encoding to generate the suffix.

Sent from Nexus 6P

On Jul 31, 2017 9:19 PM, "Fabrice Aneche" notifications@github.com wrote:

To get a forward timestamp, I'm using a reverse timestamp:

math.MaxInt64 - t.UnixNano()

Would this work in this case ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blevesearch/bleve/issues/591#issuecomment-319039750, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsyNL4lJcntuEBiNYQFwhDgf3qQyCUlks5sTbgzgaJpZM4NdgjI .

akhenakh commented 7 years ago

@bt are you working on this over my branch or do you plan to do it on a different implem ?

I'm asking cause I don't want we do the work twice :)

mschoch commented 7 years ago

Bleve also makes heavy use of iterators, so you'll also need an iterator which only sees the correct values for the snapshot, silently skipping over the values with a too old or too new timestamp. You wouldn't want to pull the values associated with those if you can avoid it, so you might want to use a key-only iterator again, and then do get operations for the keys you've determined are the ones you care about.

bt commented 7 years ago

Hey @akhenakh, I haven't started coding this yet as I'm still trying to get my head around how it would all work together, so you're free to start us off and I'll make some PRs as I go through the code and understand how you structure it :) Thanks!

bt commented 7 years ago

Hey @akhenakh, did you make much progress? I didn't manage to find your fork. Anything I can help with? 👍

akhenakh commented 7 years ago

Hey @bt, sorry I missed your message, here is my WIP https://github.com/akhenakh/bleve/commits/badger, haven't worked on it since we talked.

I had some chat with @manishrjain on Badger Slack, he provided some more details on a possible implem.

joeblew99 commented 6 years ago

Looks like transaction support is now committed to the develop branch of Badger :)

manishrjain commented 6 years ago

Doing testing right now. We should be able to release it by next week.

joeblew99 commented 6 years ago

wow that was quick. will be very interesting to benchmark it..

On Tue, Oct 3, 2017 at 7:13 AM Manish R Jain notifications@github.com wrote:

Doing testing right now. We should be able to release it by next week.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/blevesearch/bleve/issues/591#issuecomment-333741051, or mute the thread https://github.com/notifications/unsubscribe-auth/ALcaczKwP6H7beB0-GmkWl5MzoF3njZdks5socKPgaJpZM4NdgjI .

manishrjain commented 6 years ago

Badger txn support is out in master. We also did some benchmarks, which show Badger transactional writes are really fast. https://blog.dgraph.io/post/badger-txn/

joeblew99 commented 6 years ago

thank @manishrjain A side related question about transactions as I have a graph use case.

https://docs.dgraph.io/design-concepts/#note-on-transactions There is no linearizability. You have durability of course. Linearizability is a tough one for sure. Everyone is facing this same issue. Built a CQRS system with many microsoervices and event sourcing and you just cannot make everything consistent. Its like a Time / Space domain problem.

Funnily enough i have seen CRDT being used for it. But i cant really recall much right now. Wondering if there is talk about this in dgraph and approaches..

akhenakh commented 6 years ago

Hey guys got all tests passing with the recent badger Tx code. https://github.com/akhenakh/bleve/tree/badger/index/store/badger

It needs review from both parts (Bleve & Badger), I'm not familiar with the Bleve engine internally.

I don't know if the Bleve team would merge since you are working on a v2 store ? I'm not sure about which license is appropriate. I'll open PR based on your comments, thanks.

manishrjain commented 6 years ago

Happy to review the PR when/if created.

akhenakh commented 6 years ago

bump cc @mschoch

bt commented 6 years ago

@mschoch any chance you can provide some direction for the PR? I think a lot of us are excited for the possibility of using Badger with Bleve!

osiloke commented 6 years ago

@akhenakh, I have a use case that occurs when using your badger fork, creating a new index works. When you reopen that same index and try to close it, closing is blocked by badger's value log which cannot close. I'm still investigating what's causing it to lockup, just reaching out to anyone who has an idea

manishrjain commented 6 years ago

Maybe you're not discarding some transactions. When doing reads, Badger acquires read locks over the value log (to avoid it being garbage collected from underneath). If a txn does some reads, but then doesn't discard itself, that'd cause those locks to not be released, causing value log to block.

zippoxer commented 6 years ago

Any progress?

bt commented 6 years ago

Seems like a later version of BadgerDB (1.5.3) has broken the ReaderIsolation test.

The tests now fail with a panic:

panic: Only one iterator can be active at one time. [recovered]
    panic: Only one iterator can be active at one time.

This was introduced in https://github.com/dgraph-io/badger/commit/b1ad1e93e483bbfef123793ceedc9a7e34b09f79

I don't suppose we'd be changing the test function CommonTestReaderIsolation() but neither do I think it's appropriate to hack out the iterator check in Badger; any ideas? @manishrjain @akhenakh

manishrjain commented 6 years ago

Only one iterator should be active at a time -- we haven't build transactions to be thread-safe.

If this is an issue -- we could look into making txn thread-safe, not sure what would be involved in that.

bt commented 6 years ago

Thanks for the reply @manishrjain.

The current issue right now is the fact that the implementation is passing around the same Txn instance and so in the test, when it tries to create a second iterator, it causes a panic.

I’ll try to do a refactor tomorrow to see if I can get it to work but I’ve already tried a lot of things today and none seem to work (at least without changing the test function of course)

bt commented 6 years ago

@manishrjain is there a way I can get a snapshot of the BadgerDB at a certain ts (without using ManagedDB)?

I noticed you mention that using ManagedDB is literally like playing with fire; I'm trying to abstract as much as that out so that it's handled by BadgerDB, but it seems that I might need to use ManagedDB to implement Badger...

I know that creating a new transaction will also snapshot at that time, but given the way that Bleve's KVstore works, it means I will have to manage this single Txn and that seems pretty bad considering Badger emphasises concurrent Txn use...

manishrjain commented 6 years ago

If it just a test which is failing, better to modify the test, I think. You could use ManagedDB, but again, there's some understanding that'd be required.

For a read-only transaction, multiple get assigned the same timestamp. You could make use of that fact in the default DB implementation. They'd get a new timestamp if there are update transactions in between.

There's no way you can specify the timestamp to use unless you're using ManagedDB.

ghost commented 6 years ago

hey @bt. Where the code. I can take a crack at it. There are a ton of example of using the ManagedDB transactions and handling the time issue out there on github land.

https://github.com/search?o=desc&p=3&q=badger+ManagedDB&s=indexed&type=Code

ghost commented 5 years ago

@manishrjain thanks !!

ghost commented 5 years ago

@bt The changes you wanted for Badger have happened

bt commented 5 years ago

Hey @manishrjain, thanks for making the change :) @gedw99 the issue now is actually, the new index type scorch doesn't write to a KV store. So, integrating BadgerDB into Bleve will only be possible in upsidedown.

I've pretty much given up on integration due to two reasons:

  1. Using upsidedown for my use-case results in a huge index (in size). Like, we're talking exponentially sized indexes where 1 MB of data indexed will result in 100 MB of BadgerDB files.
  2. BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown).

So for those reasons I've stopped using BadgerDB over Bleve and will probably look at it again once scorch works over a KV store. I'm now implementing a WAL in scorch instead, but that itself seems to be super challenging...

mschoch commented 5 years ago

once scorch works over a KV store

There are no plans for scorch to ever use a KV store. The KV serialization requirements are part of the reason that the upsidedown format is impractical.

bt commented 5 years ago

@mschoch oooh okay -- so in that case, is there any plans to improve upsidedown to at least be closer to scorch performance?

From my benchmarking, scorch is quite close to a billion times better in all aspects, except that it doesn't write to a KV store, so I can't get benefits of Badger like the WAL or snapshotting, etc.

manishrjain commented 5 years ago

BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown).

Is it Badger taking up RAM, or the cost of serialization which is causing the RAM usage? If it is Badger, I'd like to identify the cause -- maybe share a heap profile or something.