ankane / neighbor

Nearest neighbor search for Rails
MIT License
589 stars 14 forks source link

TypeError: can't quote Array (occasionally) #23

Closed gjtorikian closed 3 months ago

gjtorikian commented 3 months ago

I'm not quite sure what's causing it, but sometimes, neighbor throws the following error:

[123, 132] in ~/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/neighbor-0.4.0/lib/neighbor/model.rb
   123|           column_attribute = klass.type_for_attribute(attribute_name)
   124|           vector = column_attribute.cast(vector)
   125|           Neighbor::Utils.validate(vector, dimensions: dimensions, column_info: column_info)
   126|           vector = Neighbor::Utils.normalize(vector, column_info: column_info) if normalize
   127| 
=> 128|           query = connection.quote(column_attribute.serialize(vector))
   129|           order = "#{quoted_attribute} #{operator} #{query}"
   130|           if operator == "#"
   131|             order = "bit_count(#{order})"
   132|           end
=>#0    block {|attribute_name=:embedding_1536, vector=[0.011979911, -0.03421443, 8.979617e-05,..., options={:dimensions=>nil, :normalize=>nil}|} in has_neighbors (2 levels) at ~/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/neighbor-0.4.0/lib/neighbor/model.rb:128
  #1    [C] BasicObject#instance_exec at ~/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/activerecord-7.2.0/lib/active_record/relation.rb:548
  # and 46 frames (use `bt' command for all frames)
(ruby@bin/rails#63657) connection.quote(column_attribute.serialize(vector))
eval error: can't quote Array
  /Users/gjtorikian/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/activerecord-7.2.0/lib/active_record/connection_adapters/abstract/quoting.rb:87:in `quote'
  /Users/gjtorikian/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/activerecord-7.2.0/lib/active_record/connection_adapters/postgresql/quoting.rb:122:in `quote'
  (rdbg)//Users/gjtorikian/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/neighbor-0.4.0/lib/neighbor/model.rb:1:in `block (2 levels) in has_neighbors'

This is despite the fact that none of the arguments in the nearest_neighbors scope change.

gjtorikian commented 3 months ago

Ah. Sometimes column_attribute is

#<Neighbor::Type::Vector:0x000000014de95d38 @precision=nil, @scale=nil, @limit=1536>

but other times, it's:

#<ActiveModel::Type::Value:0x000000014f97d588 @limit=nil, @precision=nil, @scale=nil>
gjtorikian commented 3 months ago

Manually setting the cast_type works: attribute :embedding_1536, Neighbor::Type::Vector.new.

ankane commented 3 months ago

Hi @gjtorikian, I'm not sure how to reproduce the issue (one guess is another gem could be interfering with it). If you can create a minimal reproducible script, happy to look into it more.

require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "activerecord", require: "active_record"
  gem "neighbor", github: "ankane/neighbor"
  gem "pg"
end

ActiveRecord::Base.establish_connection adapter: "postgresql", database: "neighbor_repro"
ActiveRecord::Base.logger = ActiveSupport::Logger.new(STDOUT)

ActiveRecord::Schema.define do
  enable_extension "vector"

  create_table :items, force: :cascade do |t|
    t.column :embedding, :vector, limit: 3
  end
end

class Item < ActiveRecord::Base
  has_neighbors :embedding
end

Item.create!(embedding: [1,2,3])
p Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)
jkostolansky commented 3 months ago

Maybe similar issue here. After upgrading to rails 7.2, the tests works on my Mac, but fail in GitHub Actions with:

TypeError: can't cast Array

Setting the type manually fixes it:

attribute :embedding, Neighbor::Type::Vector.new # <-- Fix
has_neighbors :embedding, dimensions: 3072
ankane commented 3 months ago

Seems like it may be something with Rails 7.2, possibly related to https://github.com/rails/rails/issues/52607. Does changing config.eager_load or enabling/disabling parallel tests fix it?

jkostolansky commented 3 months ago

Disabling parallelization in tests also seems to work.

ankane commented 3 months ago

Was able to reproduce with parallel tests and config.eager_load = true, but still trying to figure out the cause.

A temporary fix is to call reset_column_information in parallelize_setup.

class ActiveSupport::TestCase
  parallelize_setup do |worker|
    Item.reset_column_information
  end
end
ankane commented 3 months ago

This is fixed by https://github.com/rails/rails/pull/52703.