While with a small number of records it's not a problem searching an sorting the results, on larger datasets it becomes a real performance issue.
Solution
What would address this problem (I feel) would be to add limit, order, and threshold options.
# Order results by specified columns
doc.nearest_neighbors(:embedding, distance: "inner_product", order: { neighbor_distance: :desc })
# Only return records with distance score > or < X (gte, gt, lte, lt)
doc.nearest_neighbors(:embedding, distance: "inner_product", threshold: { gte: 0.9 })
# Limit number or records returned from neightbor search
doc.nearest_neighbors(:embedding, distance: "inner_product", limit: 5)
While all these operations can obviously be performed with any returned result in memory, it would be way better to have them happen at the DB level.
Problem
Unless I'm missing something, I believe that adding support for limiting and ordering is an important feature. Consider the model:
Currently if I run a
nearest_neighbor
search on the doc, it returns all documents per the default ordering in Rails.While with a small number of records it's not a problem searching an sorting the results, on larger datasets it becomes a real performance issue.
Solution
What would address this problem (I feel) would be to add
limit
,order
, andthreshold
options.While all these operations can obviously be performed with any returned result in memory, it would be way better to have them happen at the DB level.