Open jiwidi opened 1 day ago
shaped-bloom-filter appears to only support 32 bit integers for its BloomFilter
type (which seems pretty limiting).
However, its BloomFilterExtended
supports bytes
(with separate function names).
But when comparing performance, it appears to be the slowest implementation of any we compare with by far:
For example, using the following code:
import time
import rbloom
import shaped_bloom_filter
NUM_ITEMS = 100_000
def time_run(bloom_type) -> float:
start = time.perf_counter()
bf = bloom_type(NUM_ITEMS, 0.01)
for i in range(NUM_ITEMS):
bf.add(i)
for i in range(NUM_ITEMS):
if i not in bf:
raise ValueError("Should be no false negatives")
end = time.perf_counter()
return end - start
print(f"rbloom {time_run(rbloom.Bloom):.4f} sec to run {NUM_ITEMS} items")
print(f"shaped-bloom-filter {time_run(shaped_bloom_filter.BloomFilter):.4f} sec to run {NUM_ITEMS} items")
Outputs the following on my machine (m1 pro mac):
rbloom 0.0106 sec to run 100000 items
shaped-bloom-filter 31.3293 sec to run 100000 items
(approximately 3000 times slower).
I'm using Python 3.11.10, and installed with pip install shaped-bloom-filter setuptools
(shaped-bloom-filter seems to require setuptools to be installed, but doesn't specify it as a dependency).
Unless I've done something wildly wrong, I don't think shaped-bloom-filter is even close to competitive, even when using ints directly.
shaped-bloom-filter appears to only support 32 bit integers for its
BloomFilter
type (which seems pretty limiting).However, its
BloomFilterExtended
supportsbytes
(with separate function names).But when comparing performance, it appears to be the slowest implementation of any we compare with by far:
For example, using the following code:
import time import rbloom import shaped_bloom_filter NUM_ITEMS = 100_000 def time_run(bloom_type) -> float: start = time.perf_counter() bf = bloom_type(NUM_ITEMS, 0.01) for i in range(NUM_ITEMS): bf.add(i) for i in range(NUM_ITEMS): if i not in bf: raise ValueError("Should be no false negatives") end = time.perf_counter() return end - start print(f"rbloom {time_run(rbloom.Bloom):.4f} sec to run {NUM_ITEMS} items") print(f"shaped-bloom-filter {time_run(shaped_bloom_filter.BloomFilter):.4f} sec to run {NUM_ITEMS} items")
Outputs the following on my machine (m1 pro mac):
rbloom 0.0106 sec to run 100000 items shaped-bloom-filter 31.3293 sec to run 100000 items
(approximately 3000 times slower).
I'm using Python 3.11.10, and installed with
pip install shaped-bloom-filter setuptools
(shaped-bloom-filter seems to require setuptools to be installed, but doesn't specify it as a dependency).Unless I've done something wildly wrong, I don't think shaped-bloom-filter is even close to competitive, even when using ints directly.
thanks this is what I was looking, any chance to test with int32s too?
but very promising for rbloom!
The test above does use int32s, it doesn't use BloomFilterExtended.
Hi!
First thanks for the work done with rbloom. Are you aware of the library https://pypi.org/project/shaped-bloom-filter/ which is a wrapper in top of a go bloom filter. Curious to know performance between this and your rust implementation.
Thanks!