Open tri2820 opened 6 months ago
Hi! Thank your for sharing this behavior. That is actually weird, it seems that with a lot of rounds (100 000) we get a 66% of false negative and without the prefix https://www.youtube.com/watch?v=
we get 33%. When dropping the error rate to 3% there is now no error at all what should be expected!. 🙄 Increasing the error rate up to 10% does not increase the number of false negative it stays at 33% without the prefix and 66% with the prefix. So for now no explanation on this, we need to investigate more but @Callidon and me are very busy so it may take longer than expected.
For now; try to use 0.03
and to remove the prefix https://www.youtube.com/watch?v
.
Reproductible code on runkit (https://npm.runkit.com/bloom-filters)
const { CuckooFilter } = require("bloom-filters");
let items = [
"https://www.youtube.com/watch?v=HJjxN05ewEc",
"https://www.youtube.com/watch?v=BZNUo7orS3k",
"https://www.youtube.com/watch?v=SD-McWZz_pk",
"https://www.youtube.com/watch?v=De4QjH9fpgo",
"https://www.youtube.com/watch?v=Hzko-cjHhTg",
"https://www.youtube.com/watch?v=vqR-8lgOmBE",
"https://www.youtube.com/watch?v=j6u0LH67YLk",
"https://www.youtube.com/watch?v=B2z8ikGLRh8",
"https://www.youtube.com/watch?v=N3ftBeP16TA",
"https://www.youtube.com/watch?v=38RBRPwODUk",
"https://www.youtube.com/watch?v=Ry8nSUfX6fY",
"https://www.youtube.com/watch?v=-KrYohUJvYw",
"https://www.youtube.com/watch?v=zRpl7Pr0fs4",
"https://www.youtube.com/watch?v=uYYiypp6WaY",
"https://www.youtube.com/watch?v=EPap21FBGbE",
"https://www.youtube.com/watch?v=zN2_0WC7UfU",
"https://www.youtube.com/watch?v=DNVwnkgTzbY",
];
const without_prefix = false;
const prefix = "https://www.youtube.com/watch?v="
const errorRate = 0.04; // 4 % error rate
if (without_prefix) {
items = items.map(e => e.replace(prefix, ""));
}
const round = 100000
let c_false = 0
const filter = CuckooFilter.from(items, errorRate);
for (let i=0; i<round; i++) {
let val = filter.has(`${without_prefix ? '' : prefix}HJjxN05ewEc`);
if (!val) {
c_false++;
}
val = filter.has(`${without_prefix ? '' : prefix}38RBRPwODUk`);
if (!val) {
c_false++;
}
val = filter.has(`${without_prefix ? '' : prefix}-KrYohUJvYw`);
if (!val) {
c_false++;
}
}
console.log('Number of false positive:', c_false)
console.log('Rate:', c_false / (round * 3))
Hi @folkvir, from the code it seems for each round we are just checking membership using '.has'.
We already know 2 out of 3 hard-coded items there would return false, so it would always return 66% no matter how many rounds we run.
I'm using CuckooFilter to test the membership of the following items. The filter returns false - definitely not in the list - for 2 items. These cases are false negatives which should not happen.
Is this a bug or am I missing something? Version: "bloom-filters": "^3.0.1"