Closed weilandia closed 11 months ago
Hi @weilandia, either approach should work.
For 1, nearest_neighbors
can be called on a relation and returns a relation, so you can add where(embeddable_type: ...)
.
Here's a full example for 2:
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "activerecord", require: "active_record"
gem "neighbor"
gem "pg"
end
ActiveRecord::Base.establish_connection adapter: "postgresql", database: "neighbor_example"
ActiveRecord::Schema.define do
enable_extension "vector"
create_table :documents, force: :cascade do |t|
t.text :name
end
create_table :document_chunks, force: :cascade do |t|
t.belongs_to :document
t.column :embedding, :vector, limit: 3
end
end
class Document < ActiveRecord::Base
has_many :document_chunks
end
class DocumentChunk < ActiveRecord::Base
belongs_to :document
has_neighbors :embedding
end
def generate_embedding
3.times.map { rand }
end
d1 = Document.create!(name: "Doc 1")
3.times { d1.document_chunks.create!(embedding: generate_embedding) }
d2 = Document.create!(name: "Doc 2")
3.times { d2.document_chunks.create!(embedding: generate_embedding) }
chunks = DocumentChunk.includes(:document).nearest_neighbors(:embedding, generate_embedding, distance: "cosine").first(5).uniq(&:document_id)
pp chunks.map(&:document).map(&:name)
Thanks @ankane!
Let's say you have a collection of
Documents
and you store embeddings for these documents, but due to token constraints, you have to store multiple embeddings tied to each document.Do you have a recommended way for supporting this with Neighbor?
Current ideas:
Embeddings
model that has anembeddable
--Not sure Neighbor would support this out of the box because it would have to support filtering onembeddable_type
DocumentEmbeddings
model and run nearest neighbor searches on that with some deduplication logic