Closed breezykermo closed 3 weeks ago
After reading and discussing the paper, we came up with an idea for an extension of ACORN that we think is well-scoped as a system that we could build over the course of the rest of the semester.
ACORN is an adaptation of HNSW that allows predicate-agnostic filtered searches with better latency than pre-filtering (brute-force over all records that match a predicate) and post-filtering (performing regular HNSW search and then filtering by predicate) for low-selectivity queries in particular. Its core mechanism is additional neighbor expansion during search, which retrieves a greater number of candidates that match a predicate with low-selectivity, a mechanism that approximates the latency of the 'oracle partition index'. This oracle index is the HNSW graph that could have been built by taking the subgraph of records that match a particular predicate.
OAK is a system that opportunistically builds such oracle indexes to speed up queries with common predicates. We define an index whose construction is opportunistic by way of the following qualities:
When a query is received, OAK considers whether any opportunistic indexes exist that could accelerate it. By default, and when no such opportunistic index exists, it uses the 'seed' index, an ACORN-style HNSW index.
Here's the implementation strategy for OAK I would suggest.
OpportunismStrategy
. Given that there are many different ways one could imagine triggering an index construction, and parameterizing when it makes sense to do so, I think we want this abstracted so that we can try different strategies.OpportunismStrategy
. It may make sense to reopen the discussion in #17 here, as we now want loads that simulate real-world in the sense that there are periods with less queries, such that there is spare CPU to build OIs. Alternatively, we could begin with the distributed opportunism we discussed: where OIs are only ever built on nodes not in the query path, and OAK makes use of additional hardware up until some threshold set by a user.This idea also raises a reasonable critique of ACORN and other ANNS evaluations. They don't take into account queries sent over time, which is closer to a real-world load.
MVP:
OpportunismStrategy
In an attempt to narrow the scope of our semester's project, we are considering whether it might be interesting to consider the feasibility of distributing an index modified for filtered ANNS, such as ACORN.