facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.74k stars 6.34k forks source link

Support MergeOperator in Java #2282

Open sagar0 opened 7 years ago

sagar0 commented 7 years ago

MergeOperator functionality is not currently available in the Java API. StringAppendOperator is the only implementation that is exposed as part of Java API today. There has been considerable interest from the community recently to expose full functionality of MergeOperator so that they can take advantage of it in Java as well (Ex: https://github.com/facebook/rocksdb/issues/1988 , https://github.com/facebook/rocksdb/pull/2289)

cc: @adamretter

adamretter commented 7 years ago

@sagar0 So I have a work in progress for this. However, trying to figure out an efficient API is very hard as their is substantial cost for transferring the operand_list across the JNI boundary. I wonder if it is possible to publish an experimental API and get some feedback first?

sagar0 commented 7 years ago

I do understand that this is a hard one and I +1 the idea of publishing an experimental API and seeking community feedback, if any.

MPdaedalus commented 7 years ago

is there any progress on this issue? Maybe something like map.put(key,oldValue,newValue):boolean could be implemented for atomic updates, I know it would not be as efficient but if there is another way to do atomic updates for RockDb in java without using counters i'm all ears.

StefanRRichter commented 6 years ago

@adamretter Also interested if there is any ETA for this? I also noticed that our workaround for some merge performance problems mentioned in #1988 (setting options.setMergeOperatorName("stringappendtest") instead of using "stringappend") is no longer working for RocksDB >= 5.8.X. I could not find an obvious change that introduced this regression, but it keeps me from updating our dependency to a newer version.

jcalcote commented 6 years ago

+1 on this issue. @adamretter please provide some feedback on the current status. I would love to test and/or help out if possible.

GalRogozinski commented 6 years ago

@adamretter do you have any news?

adamretter commented 6 years ago

@GalRogozinski My plan is to implement this in the later part of November and December.

guillaumepitel commented 6 years ago

That would be a very lovely feature to have. For now, we are avoiding Gets for updates by using key suffix + a background sweep on the base which does the merge + put + delete. That's nasty, but still has significantly better performance than pure get/put. The PR here https://github.com/facebook/rocksdb/pull/3432 seems to have proposed some interesting ideas, and it feels like apart from requiring cleanup & straightening the code, it was doing the job.

guillaumepitel commented 5 years ago

We've started working with @gfouquier on a Java-side MergeOperator based on PR #3432 by @publicocean0 however there seems to be quite a few redunduncy between the init.[cc|h] and the portal.[cc|h] and a lot of thing are not working anymore probably due to evolutions in the master.

I'm not sure why @GalRogozinski unassigned @adamretter from this issue, since he seems by far the most experienced in this matter.

@adamretter, can you confirm you're working on it ? We'd love to participate if you have a banch we could test.

GalRogozinski commented 5 years ago

It was a github bug. I just mentioned @adamretter and it unassigned him. I am not even supposed to have permissions here.

I just mentioned him again because maybe it will assign him back :trollface:

mrambacher commented 5 years ago

Is the ask to have MergeOperators written in Java or to have custom MergeOperators written in C++ available to Java? I am working on something that I hope opens up more of the extensions written in C++ available to Java (the latter of the two).

guillaumepitel commented 5 years ago

Is the ask to have MergeOperators written in Java or to have custom MergeOperators written in C++ available to Java? I am working on something that I hope opens up more of the extensions written in C++ available to Java (the latter of the two).

I may speak only for me, but I understand that this issue is about the first case. And by the way, we have a working version here, based on #3432 from @publicocean0 - it's not yet ready for a proper PR though, and we're first testing it heavily.

guillaumepitel commented 5 years ago

@sagar0 So I have a work in progress for this. However, trying to figure out an efficient API is very hard as their is substantial cost for transferring the operand_list across the JNI boundary. I wonder if it is possible to publish an experimental API and get some feedback first?

@adamretter you seem to worry about JNI performance, so I think I could share some stuff I'm working on.

  1. attach/detach thread is extremely costly, at least when the C++ code is in a C++ non-JVM thread an calls back a Java method. We can avoid systematic attach/detach by tracking the threads destruction, taking care to attach the thread as a daemon thread to avoid blocking at JVM destruction (plus some exta tips). Performance gain is huge : my current version of MergeOperator take 10s with long-term attach/detach, and 110s with systematic attach/detach

  2. Accessing a Java byte[] from C++ can be done without copy using GetPrimitiveArrayCritical. It's not for use everywhere, but for writebatch.put/merge (kv_op) it can be done.

  3. Passing a byte[] from C++ to Java can be done using NewDirectByteBuffer (so a java.nio.ByteBuffer) which wraps a native memory address in a ByteBuffer access pattern. Offering a double API (byte[] or ByteBuffer) seems like a good idea for at least a handful of functionnalities : Comparators, MergeOperators, Filters, Iterators. For Java->C++ calls that would mean adding extra methods, like for instance RocksIterator.keyAsByteBuffer . For C++ -> Java CallBacks that would mean doubling hte Abstract Classes, like Comparator and NioComparator or ByteBufferComparator which would have a methode compare(Bytebuffer a, ByteBuffer b) instead of compare(byte[] a, byte[] b)

sagar0 commented 5 years ago

@mrambacher:

Is the ask to have MergeOperators written in Java or to have custom MergeOperators written in C++ available to Java?

The ultimate aim of this issue is to allow RocksJava API users to allow accessing custom merge operators without modifying RocksDB library code. The merge operator code should lie in the application's code repository. We should allow the merge operators to be written in Java, but before that a good first step would be to allow custom-written C++ merge operators to be exposed to be used in the java API. Maybe something like (just thinking out loud):

RegisterMergeOperator("your-custom-C++-merge-op", java-merge-op-obj-name);
options.setMergeOperator(java-merge-op-obj-name);
mrambacher commented 5 years ago

@sagar0: I generated a PR that shows the basic thinking of what I am proposing. We are also trying to add functionality to RocksDB without modifying the Rocks code itself. I have not added MergeOperators to the list for this PR, but am working on it now.

To add a MergeOperator written in C++ from a custom shared library to the rocks code, you would do something like: options.AddExtensionLibrary("Custom Library"); options.SetMergeOperator("YourMergeOperator", "YourMergeOperatorProperties")

I plan to push something out in the next few weeks that does exactly this for Cassandra.

To implement a Java-based MergeOperator, I would assume someone would write a C++ one that delegates back to the Java layer. Then the "SetMergeOperator" would use the name of the C++ one and pass in properties for the Java class.

If there is interest on this thread, I will add a comment here when I generate the PR for Cassandra using this model.

gfouquier commented 5 years ago

On Mon, Dec 17, 2018 at 7:36 PM mrambacher notifications@github.com wrote:

To implement a Java-based MergeOperator, I would assume someone would write a C++ one that delegates back to the Java layer. Then the "SetMergeOperator" would use the name of the C++ one and pass in properties for the Java class.

we are currently working on this part, based on #3432 https://github.com/facebook/rocksdb/pull/3432but more integrated into rocksdb. Our current approach is to modify the java MergeOperator class which inherit from RocksCallbackObject and allows to instantiate JniCallback onthe c++ side.

gfouquier commented 5 years ago

I submitted a pull request to support merge operators in java here : https://github.com/facebook/rocksdb/pull/4805

On Mon, Dec 17, 2018 at 9:01 PM Geoffroy Fouquier g.fouquier@gmail.com wrote:

On Mon, Dec 17, 2018 at 7:36 PM mrambacher notifications@github.com wrote:

To implement a Java-based MergeOperator, I would assume someone would write a C++ one that delegates back to the Java layer. Then the "SetMergeOperator" would use the name of the C++ one and pass in properties for the Java class.

we are currently working on this part, based on #3432 https://github.com/facebook/rocksdb/pull/3432but more integrated into rocksdb. Our current approach is to modify the java MergeOperator class which inherit from RocksCallbackObject and allows to instantiate JniCallback onthe c++ side.

guillaumepitel commented 4 years ago

Hello, is there a reason why the pull request by @gfouquier hasn't been merged ? It's a pretty big deal for us to have this functionality upstream. We use it in production in a modified Rocksdb and we think this would enable lots of people to use merge functionnality of rocksdb in java

brary commented 4 years ago

Hi @adamretter , I would also want this PR merged. Is there any plan for it?

Also, I see there is options.setMergeOperator() available in rocks java, isn't its role to use merge operator in java?

mrambacher commented 4 years ago

There are a couple different things going on in this thread. Are you asking for access to more of the C++ MergeOperators in Java or asking for the ability to write MergeOperators in Java? I have been working on making the former easier to do but not the ability to write Java MergeOperators.

brary commented 4 years ago

Oh, so you mean the ability to write MergeOperators in Java is already available by options.setMergeOperator() and this PR is meant to access more of the C++ MergeOperators in Java?

guillaumepitel commented 4 years ago

No I think this is the opposite, the ability to use C++ merge operators is already available and this PR is to ba able to write java merge operators. We use them heavily and rely currently on a patched version of RocksDB. We also we really like for our PR to be merged. AFAIK, there has been no news about any plan since my colleague @gfouquier has submitted the last version.

brary commented 4 years ago

Thanks @guillaumepitel for information. Can you clarify what StringAppendOperator.java does then? Looks like that is exposed in java, which can be set using options.setMergeOperator().

I just started exploring merge operator and looks like it would be helpful for our use-case where we have a read-modify-write (key:[v1,v2,v3]) and we want to convert it into an append using merge operator, so all values v1, v2 and v3 can be merged during key reads.

guillaumepitel commented 4 years ago

@brary this is precisely how C MergeOperators are currently supported from the Java : StringAppendOperator is just a wrapper around a native JNI call that encapsulate a Java Strings concat operation written in C/JNI. That's probably what @mrambacher is trying to improve. Look at the code of StringAppendOperator, it calls a native method to create a new instance.

mrambacher commented 4 years ago

@guillaumepitel @brary Right. I am trying to make any and all MergeOperators (and other "plugin classes") accessible to Java and C++ via simple APIs. See #6533 for more information.

adamretter commented 4 years ago

We found that performance was quite bad with out initial implementation, we are exploring new routes to working around the performance issues and hope to send a PR in the next month or two.

east825 commented 3 years ago

Hi there! Any work in this area lately? We're also really interested in this, but at the same time, at the moment, we don't really need a full-blown API exposing all the features of merge operators in Java, but rather want to overcome a single limitation of StringAppendOperator. Namely, to have an option to concatenate values without a delimiter character, just one next to another. Does it sound like a legitimate feature to have in RocksJava? Is anyone else interested in that?

mrambacher commented 3 years ago

I am working on making MergeOperators into Customizable objects, meaning I can configure and load them more generically and through options file. I have considered changing the StringAppend operator to use a string (instead of char) separator when I complete this work. Having an empty string would allow for the behavior you are looking for.

From there it would become a matter of getting that interface/method exposed to Java...

east825 commented 3 years ago

@mrambacher This level of configuration definitely sounds really cool, but, on the other hand, if it's still WIP, it seems not that difficult to add an extra StringAppend constructor accepting a String instance instead of a char already now, updating C++ internals not that drastically (please correct me if I'm fooling myself here). If no one objects to extending the API this way, I think I could try to come up with such a PR.

eric-maynard commented 2 years ago

@east825 We have a similar use case; were you ever able to move forward with this?