FastFilter / fastfilter_java

Fast Approximate Membership Filters (Java)
Apache License 2.0
238 stars 27 forks source link

Proposal to provide a simple API #12

Open richardstartin opened 4 years ago

richardstartin commented 4 years ago

While the xor filters are what is new and exciting about the contents of this repository, the other filters are good and there are tradeoffs which make them useable. In order to make an accessibly library out of this work, I think it needs a simple API with feature discoverability.

There are a few considerations:

I propose new interfaces:

interface Filter {
    boolean mayContain(long key);
    // serialisation can be specified at this level
}

interface MutableFilter extends Filter {
    void add(long key);
}

interface RemovableFilter extend MutableFilter {
    void remove(long key);
}

I propose a simple builder style API which exposes these concerns and provides sensible defaults. The type of filter would be first choice the user needs to make (bloom/xor/cuckoo/xorplus). Depending on this choice, the user is forced to accept building mutable/immutable/removable filters: you can't try to build an xor filter and end up with a remove or an add method. The user can then provide a number of bits per key, falling back to sensible values when not supplied, and according to a basic fallback policy whenever the precise choice isn't possible. The user can provide a seed iterator, which falls back to something similar to calls to Hash.randomSeed(), and this would replace the calls to new Random().nextLong() everywhere. The complicated taxonomy of bloom filter variants can take parameters in a similar manner.

This would look something like


var xorBuilder = Filter.xor()
       .withBitsPerKey(9) // promotes to 16
       .withSeedProvider(new SplittableRandom(0)::nextLong);
       ...
// get a succinct counting blocked bloom filter
var bloomBuilder = Filter.bloom()
.withBitsPerKey(11)// just pass this on to the bloom filter implementation
.blocked()
.counting();

For building, the user gets a choice between passing arrays and an LongIterator:


public interface LongIterator {
    long next();
    boolean hasNext();
}

interface Buildable<T extends Filter> {
    public T build(long[] keys);
    public T build(LongIterator keyIterator);
}
richardstartin commented 4 years ago

To make the proposal more concrete, there is a diff at #13.

thomasmueller commented 4 years ago

Yes I agree. I don't have time currently to give a detailed answer however... One thing to consider is try to separate the interfaces from the implementation, for OSGi / backward compatibility. I'm not saying we need to support OSGi currently... But having to import implementation classes in interface definitions sounds wrong. So the builders should be separated from the interfaces I think.