ankane / neighbor

Nearest neighbor search for Rails
MIT License
590 stars 14 forks source link

Limiting and ordering results #11

Closed sebscholl closed 1 year ago

sebscholl commented 1 year ago

Problem

Unless I'm missing something, I believe that adding support for limiting and ordering is an important feature. Consider the model:

doc = Document(text="Some text", embedding=[])

Currently if I run a nearest_neighbor search on the doc, it returns all documents per the default ordering in Rails.

puts doc.nearest_neighbors(:embedding, distance: "inner_product").map(&:neighbor_distance)
=> [
  0.7474747,
  0.4638648,
  0.8382633,
  0.9837744,
  0.9237373,
  0.8366281
]

While with a small number of records it's not a problem searching an sorting the results, on larger datasets it becomes a real performance issue.

Solution

What would address this problem (I feel) would be to add limit, order, and threshold options.

# Order results by specified columns
doc.nearest_neighbors(:embedding, distance: "inner_product", order: { neighbor_distance: :desc })

# Only return records with distance score > or < X (gte, gt, lte, lt)
doc.nearest_neighbors(:embedding, distance: "inner_product", threshold: { gte: 0.9 })

# Limit number or records returned from neightbor search
doc.nearest_neighbors(:embedding, distance: "inner_product", limit: 5)

While all these operations can obviously be performed with any returned result in memory, it would be way better to have them happen at the DB level.

sebscholl commented 1 year ago

Pull request here: https://github.com/ankane/neighbor/pull/12/commits

ankane commented 1 year ago

See https://github.com/ankane/neighbor/pull/12#issuecomment-1732656462