Closed PJUllrich closed 1 week ago
Funny observation: If I reduce the bucket_size, the error occurs after fewer items. With bucket_size: 4 (default)
it occurs after 3725 items, with bucket_size: 16
it occurs after 24531 items, with bucket_size: 64
, it occurs after 40869 items.
I built a workaround by converting the integer to a string first. I was just surprised by this because the docs said that I could use any hash function as long as it "can convert a string to a hash". But this works too now :)
filter =
:cuckoo_filter.new(65_536,
bucket_size: 32,
hash_function: &hash/1,
name: :tmp_blocklist_cache
)
def hash(input) when is_binary(input), do: XXHash.xxh32(input)
def hash(input) when is_number(input) do
input |> to_string() |> hash()
end
Hi Peter,
This is actually happening because I made a breaking change in version 1.0.0 and forgot to update the documentation.
Since version 1.0.0 hash_function
must be a function that accepts any term and returns an integer.
So in your case the solution would be to create a new filter like this:
filter =
:cuckoo_filter.new(65_536,
bucket_size: 32,
hash_function: fn(input) -> XXHash.xxh32(:erlang.term_to_binray(input)) end,
name: :tmp_blocklist_cache
)
Okay, no worries. thank you :) that worked!
Hello there 👋
when I use a custom hash function and add many items in a loop, I receive the following error:
My Setup
This is my filter definition and how I add a long list (~50k) of items to the filter:
The error occurs after adding exactly
36404
items. Is this maybe a capacity issue?Note
I'm using the
xxHash
library here because I want to use the library inside my own library which will have to run on a many different OSs and I already received a compilation error for the recommendedxxh3
library on my macOS system. That's why I opted for the nativexxHash
implementation.