ankane / neighbor

Nearest neighbor search for Rails
MIT License
590 stars 14 forks source link

Is there a good way to support multiple embeddings tied to a particular record? #15

Closed weilandia closed 11 months ago

weilandia commented 11 months ago

Let's say you have a collection of Documents and you store embeddings for these documents, but due to token constraints, you have to store multiple embeddings tied to each document.

Do you have a recommended way for supporting this with Neighbor?

Current ideas:

ankane commented 11 months ago

Hi @weilandia, either approach should work.

For 1, nearest_neighbors can be called on a relation and returns a relation, so you can add where(embeddable_type: ...).

Here's a full example for 2:

require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "activerecord", require: "active_record"
  gem "neighbor"
  gem "pg"
end

ActiveRecord::Base.establish_connection adapter: "postgresql", database: "neighbor_example"

ActiveRecord::Schema.define do
  enable_extension "vector"

  create_table :documents, force: :cascade do |t|
    t.text :name
  end

  create_table :document_chunks, force: :cascade do |t|
    t.belongs_to :document
    t.column :embedding, :vector, limit: 3
  end
end

class Document < ActiveRecord::Base
  has_many :document_chunks
end

class DocumentChunk < ActiveRecord::Base
  belongs_to :document

  has_neighbors :embedding
end

def generate_embedding
  3.times.map { rand }
end

d1 = Document.create!(name: "Doc 1")
3.times { d1.document_chunks.create!(embedding: generate_embedding) }

d2 = Document.create!(name: "Doc 2")
3.times { d2.document_chunks.create!(embedding: generate_embedding) }

chunks = DocumentChunk.includes(:document).nearest_neighbors(:embedding, generate_embedding, distance: "cosine").first(5).uniq(&:document_id)
pp chunks.map(&:document).map(&:name)
weilandia commented 11 months ago

Thanks @ankane!