Baqend / Orestes-Bloomfilter

Library of different Bloom filters in Java with optional Redis-backing, counting and many hashing options.
Other
839 stars 245 forks source link

How to push local BloomFilter to Redis? #4

Closed ChrisCurtin closed 10 years ago

ChrisCurtin commented 11 years ago

Hi,

I want to build the filter locally then push it to Redis to avoid the huge # of round trips. However when I do this by getting the BitSet via 'getBitSet' and calling overwrite none of my items are found.

What is the correct way to copy from a local BloomFilter to Redis-backed?

Thanks,

Chris

Source code below. Parameters:

All the contains() calls are failing.

public static void main(String[] args) { String invalidPath = args[0]; long suppressedListSize = Long.parseLong(args[1]); long numItemsToAdd = Long.parseLong(args[2]);

    BloomFilter<String> filter = new BloomFilter<>(suppressedListSize, 0.01);
    filter.setHashMethod(BloomFilter.HashMethod.Murmur);

    String contact;

    // build the filter
    try {
        BufferedReader reader = new BufferedReader(new FileReader(invalidPath));
        long startTime = System.nanoTime();
        long numAdded = 0;
        while ((contact = reader.readLine()) != null) {
            numAdded++;
            if (numAdded > numItemsToAdd) break;
            filter.add(contact);
        }
        reader.close();
        long endTime = System.nanoTime();
        System.out.println("Time to create filter:" + (endTime - startTime) / 1e6);

        startTime = System.nanoTime();

        String IP = "vrd01.atlnp1";
        BloomFilterRedis<String> remoteFilter = new BloomFilterRedis<>(IP, 6379, suppressedListSize, 0.01);
        remoteFilter.setHashMethod(BloomFilter.HashMethod.Murmur);

        RedisBitSet remoteSet = (RedisBitSet) remoteFilter.getBitSet();
        remoteSet.overwrite(filter.getBitSet());

        endTime = System.nanoTime();
        System.out.println("Time to push filter:" + (endTime - startTime) / 1e6);

        startTime = System.nanoTime();
         numAdded = 0;
        reader = new BufferedReader(new FileReader(invalidPath));
        while ((contact = reader.readLine()) != null) {
            numAdded++;
            if (numAdded > numItemsToAdd) break;
            if (remoteFilter.contains(contact) == false)  {
                System.out.println("MISSING DATA???:" + contact);
            }
        }
        reader.close();
         endTime = System.nanoTime();
        System.out.println("Time to query filter:" + (endTime - startTime) / 1e6);

    } catch (Exception e) {
        System.out.println("Oops");
        e.printStackTrace();
    }

}
ChrisCurtin commented 10 years ago

It looks like this is an issue with Java's bitset functions and Redis:

https://github.com/xetorthio/jedis/issues/301

Solution (if anyone else comes across this):

https://gist.github.com/gmuller/2933940

DivineTraube commented 10 years ago

There is an overwriteBitSet method on the RedisBitSet, so now it is possible to call redisFilter.getRedisBitSet().overwriteBitSet(memoryFilter.getBitSet()) to overwrite all bits in a single call. The issue with reversed bytes is also solved. Best, Felix