Closed GoogleCodeExporter closed 9 years ago
Just FYI, we *always* discuss the merit of the API change extensively before
even looking at code.
Original comment by wasserman.louis
on 5 Sep 2012 at 3:25
Regarding the change itself...I like the idea of being able to union Bloom
filters, but I'm not sure how the method should behave when it receives Bloom
filters of different sizes. Should we only expect the same size in bits, or
should we force both the expectedInsertions and fpp to be the same? My reflex
is to throw an IAE when trying an invalid union, but is that the right response?
Original comment by wasserman.louis
on 5 Sep 2012 at 2:22
We must throw IAE for filters with mismatched sizes, fpp, or funnels. We might
need to give an additional accessor for bits.size(). The referenced patch has
a canCombine predicate although this is not as useful as the accessor.
Original comment by arg...@gmail.com
on 5 Sep 2012 at 4:24
My thinking is that if you're testing canCombine at runtime, your code is
already buggy? You should only be trying to combine Bloom filters for the same
set with the same parameters in any event, and you should know when you're
writing the code which Bloom filters are compatible.
Original comment by wasserman.louis
on 5 Sep 2012 at 4:29
I want to support an increasing total number of insertions over time. I plan
to use powers of two for expectedInsertions, such that filters will fall into a
few known sizes. I prefer an accessor for filter size over a canCombine
predicate, although the latter is sufficient for my application.
Original comment by arg...@gmail.com
on 5 Sep 2012 at 4:39
I pushed a new patch which exposes BloomFilter.size and hides canCombine:
https://github.com/andrewgaul/guava-libraries/pull/3
I would very much like Guava to accept this since otherwise I need to package
my own version of Guava. Any hints on whether this is likely to land upstream?
Original comment by arg...@gmail.com
on 28 Sep 2012 at 12:44
I like this idea too -- we've got some ongoing work with the BF API, so I'll
grab this and see how we can make this work.
Original comment by kak@google.com
on 1 Oct 2012 at 11:40
I'd suggest to rethink the naming.
In case you provide `size`, it should be clear what units get used (something
like "sizeInLong")?
The name "combine" says nothing about the operation. OK, it's union, as nothing
else works, but why not be explicit? Moreover, it could create a new filter
instead of modifying the first operand.
So I'd suggest to call it `addAll` like all collections do. But there's "put"
instead of "add", so it must be actually `putAll`. But then I don't know how to
call "canCombine".
So maybe the current names are the best, I just wanted to bring this up.
Original comment by Maaarti...@gmail.com
on 5 Oct 2012 at 9:14
@kak: this was accepted back in October. Any updates? We'd really like to see
this merged for Guava 14.0. The author of the patch in #6 is a colleague and
we've signed the contributor agreement paperwork already, if that's an issue.
Original comment by diwa...@maginatics.com
on 12 Feb 2013 at 7:37
Hi Diwaker,
We haven't made these changes yet, so it won't make it into Guava 14. We can
try to prioritize it for Guava 15 though.
-kak
Original comment by kak@google.com
on 13 Feb 2013 at 12:17
Thanks, Kak. It's unfortunate that this can't make it into Guava 14 -- Guava
releases are infrequent enough that waiting for Guava 15 might not be an option
for us. Nevertheless, if you can go through the existing patch and give us
feedback, we can have it ready for inclusion whenever Guava devs are ready to
consume it. Thanks!
Original comment by diwa...@maginatics.com
on 13 Feb 2013 at 12:40
Hi Kak,
> We haven't made these changes yet, so it won't make it into Guava 14. We can
try to prioritize it for Guava 15 though.
Now that Guava 14 is out, we'd like to see this merged for Guava 15. We're more
than happy to help with testing and updating the patch if there are any
comments or suggestions.
Original comment by diwa...@maginatics.com
on 28 Feb 2013 at 6:38
Hi Diwaker,
Thanks again for your patience. I'm working on this today, but like Martin, I
have some hesitation about the proposed instance method naming.
You've proposed the verb "combine'. In our C++ version, we're using the verb
"merge". I'm not sure which is better (or if something like union is even
better?)
Since this is going to be an instance method on BloomFilter which mutates the
current instance (instead of a static method which returns a new instance), I'm
not sure combine is the right method name.
Maybe combineWith? mergeWith?
Anyone have any thoughts here?
Original comment by k...@kloover.com
on 28 Feb 2013 at 9:38
Both combineWith and mergeWith sound better. I have a slight preference for the
latter -- mergeWith is clearer in indicating that we're mutating this instance
IMO.
Original comment by diwa...@maginatics.com
on 28 Feb 2013 at 9:56
I do not have an opinion on naming this method. addAll, combineWith,
mergeWith, unionWith, or whatever I originally submitted work for me. The
important bit is to include the functionality upstream so Maginatics can stop
using an internal fork.
Original comment by arg...@gmail.com
on 28 Feb 2013 at 10:06
OK, sounds good...this is out for code review now and will definitely make
Guava 15 (should be available at HEAD within a few days). Only thing standing
in it's way now is the naming issue (which we're discussing internally now).
Original comment by kak@google.com
on 28 Feb 2013 at 10:41
Just about to submit this change. We're gonna go with BloomFilter#mergeWith
for now, but we might change it before Guava 15.0 is released (so if anyone
wants to make a strong argument for another name, please speak up!)
Original comment by kak@google.com
on 1 Mar 2013 at 2:12
Great, thanks Kurt!
Original comment by arg...@gmail.com
on 1 Mar 2013 at 2:16
Excellent. Thanks Kak for the quick turnaround!
Original comment by diwa...@maginatics.com
on 1 Mar 2013 at 2:16
https://code.google.com/p/guava-libraries/source/detail?r=a1858510da0d99dfc3e362
2741d2134fb22e18c1
Original comment by kak@google.com
on 1 Mar 2013 at 5:50
Kurt, can you also expose a BloomFilter.size method which calls BitArray.size?
Callers want to know whether they should expect mergeWith to succeed. In the
Maginatics use case, we have BloomFilters of various power of two sizes and we
merge only compatible sizes. This was present in the pull request in comment
#6.
Original comment by arg...@gmail.com
on 1 Mar 2013 at 6:16
@argaul: Sure - adding BloomFilter#size right now! Should be pushed out next
week.
Original comment by kak@google.com
on 1 Mar 2013 at 7:38
Would it make sense to have a "canMerge(BloomFilter)" method (in addition to
size() perhaps), since there are quite a few things besides just the size that
have to be the same in order to merge?
Original comment by cgdec...@gmail.com
on 1 Mar 2013 at 8:19
Colin: I fully agree that there should be such a method, especially because of
the internals of the implementation being rather hidden. Given the name
"mergeWith" I'd suggest to call it "canMergeWith".
Actually, combining BF of different sizes can make some sense, too. Because of
the only requirement on BF being that it returns no false negatives, it's
clearly possible. When the size ratio is something like 1:2, it even works
fine. With "ugly" ratios, the result is completely unusable. Sticking with
powers of two (which is IMHO a good idea anyway) would enforce "nice" ratios.
The implementation is quite straightforward, but I doubt it's worth it - I just
wanted to bring it up in case somebody could need it one day.
Original comment by Maaarti...@gmail.com
on 1 Mar 2013 at 10:52
Added BF#canMergeWith(BF)...should be migrated externally within a few days.
Original comment by kak@google.com
on 4 Mar 2013 at 7:12
Can you make BloomFilter.size public instead of package private? I want to
call this method to separate BloomFilters into groups which I later call
mergeWith on.
Original comment by arg...@gmail.com
on 19 Mar 2013 at 6:40
@argaul: Hmmm, is there a reason BF#isCompatible(BF) won't work for you?
Original comment by kak@google.com
on 19 Mar 2013 at 6:42
I have many BloomFilters of varying sizes so O(n**2) isCompatible calls is not
ideal. I guess if callers choose only a few sizes, e.g., power of 2, the worst
case has a smaller bound.
Original comment by arg...@gmail.com
on 19 Mar 2013 at 6:55
@argaul: but currently we only allow for merging of BFs with the same exact
size...so if you are trying to separate the BFs into groups, the merge will
fail later if they are not isCompatible() == true.
Original comment by kak@google.com
on 19 Mar 2013 at 7:11
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:13
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:08
Original issue reported on code.google.com by
arg...@gmail.com
on 5 Sep 2012 at 1:12