Closed krummas closed 8 years ago
I'm sorry, but this needs quite a bit of work to be acceptable. To start with, why didn't you try to just replace the int array with IntBuffer? Your current offheap implementation easily allows memory corruption / segfaults.
There are other significant issues as well as minor things like copying existing code style (there is in at least one case an if statement that uses braces replaced with one omitting them).
I disagree, but if it makes you feel better, I will refactor it to use a direct byte buffer
What other "significant" issues are you talking about? Not very helpful.
Why the hostility? Have I done anything wrong except try to help out your project?
It's not intended as hostility. I'm falling asleep midway through and my girlfriend is waiting to finish watching a movie. Just terseness. Your original response indicated a need for details.
but just for reference, here are some things you did wrong:
The reason it isn't helpful is because being helpful to that detail is time consuming and exhausting. So to summarize, the biggest thing you did wrong was submit code so awful that there is no way it is going to be accepted without either harming the library, or inordinate levels of effort on my part. Most likely to the extent of rewriting it entirely or wishing I had. However, since cassandra wants it, and enough people with the power to ask me to do things will care about that, that's going to happen... and at this point, really, already has.
I hope you understand that in your future endevours.
First, it is not the middle of the night, I live in Sweden
Second, I absolutely did not intend to be rude, maybe it was a language thing, I just accepted your comment and fixed the pull request the way you suggested.
The question about the other significant issues was just so that I could fix all the things at once and not have to wait.
And it was my admiration for that sentiment that, despite my personal, local, situation, motivated me to highlight as many as I could before my brain completely failed.
I only mentioned the time add an explanation for the terse nature of some of the comments.
On Thu, Mar 10, 2016, 03:31 Marcus Eriksson notifications@github.com wrote:
The question about the other significant issues was just so that I could fix all the things at once and not have to wait.
— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/103#issuecomment-194732511.
OK, sorry for any misunderstanding and I hope the new patch is more suited for inclusion
@krummas
First, thank you for the PR, any contribution to this project is appreciated. We are very happy that stream-lib is a part of Cassandra and the idea of storing registers off-heap is a good one.
Second, Please allow me to personally apologize for the rudeness of the comments in this thread. That type of response is completely unacceptable and not something we condone. I am embarrassed and I hope you can accept this apology.
This is a big change and it will require some testing and analysis before we can merge in. Give us some time to work through the proposed changes and we'll get back to you when we have more questions or alternative approaches.
Thanks, Matt Abrams
Thanks for the pull request! Here are some initial questions:
Thanks!
@krummas https://github.com/krummas
First, thank you for the PR, any contribution to this project is appreciated. We are very happy that stream-lib is a part of Cassandra and the idea of storing registers off-heap is a good one.
Second, Please allow me to personally apologize for the rudeness of the comments in this thread. That type of response is completely unacceptable and not something we condone. I am embarrassed and I hope you can accept this apology.
This is a big change and it will require some testing and analysis before we can merge in. Give us some time to work through the proposed changes and we'll get back to you when we have more questions or alternative approaches.
Thanks, Matt Abrams
— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/103#issuecomment-194845536.
This is for a new feature in Cassandra - with our earlier usage of stream-lib we deserialize the registers from disk whenever we need them, with this new feature we will be using them a lot more and need to keep them in memory all the time.
I will try to do a benchmark - we will potentially have 1000+ long lived registers in memory (one per sstable) and do many merges, will try to get something done early next week or so.
Hi Marcus,
Thanks for your patience while we review the PR. We are a big user of Cassandra ourselves, so we are very happy to do our part to support this feature. Here are some of my thoughts on the PR:
finalize
method to do this automatically. The client code that creates the off-heap RegisterSet should be responsible for releasing its memory, and there should be an explicit API for the client code to do that.Buffer.flip
. The code is correct functionally, but I think it'll be easier for developers who are not familiar with Buffer
to read if we just loop over the elements when copying elements to/from an array. (I actually had to look at the buffer code and do some experiment myself to make sure it's doing the right thing)With all that, here's my suggestion:
RegisterSet
an interfaceArrayRegisterSet
implementation that uses Java native array (basically the original code). This will be the default RegisterSet
that HyperLogLog
and HLLP will use.With these two changes, you have the option now to create custom RegisterSet implementations in your application. You can pass them to the HLL and HLLP constructors. In your case for Cassandra, for example, you can implement an OffHeapRegisterSet (using Unsafe or IntBuffer). It will have a destroy
method where it frees its memory. Cassandra code will keep references to all the unsafe register sets and call their destroy
methods at the appropriate time. This way the core of the library is not affected while you get the flexibility to do the off-heap customization.
Consider adding serialize/deserialize methods to the RegisterSet
interface. That way we may not have to worry about changing the version number. That should make the code simpler.
Let me know what you think.
Yuesong
sounds good! will update the patch as soon as possible
This makes it possible to keep the register set off heap - we need it in Cassandra for long lived HLLP instances (we use it for overlap estimation between sstables) https://issues.apache.org/jira/browse/CASSANDRA-11035