brianhempel / active_record_union

UNIONs in ActiveRecord! Adds proper union and union_all methods to ActiveRecord::Relation.
Other
425 stars 43 forks source link

Lack of FROM subquery name can break aggregate functions #35

Open ezekg opened 1 week ago

ezekg commented 1 week ago

Not sure if this is a bug in Rails or this gem, but since a subquery name is not provided to from here (even though it's technically available on the Arel::Nodes::TableAlias instance), Active Record can't determine its table name. This results in AR using an unqualified column name, which can cause ambiguous column errors for aggregate functions like maximum(:updated_at) if there's a join after the union, resulting in an ActiveRecord::StatementInvalid error getting raised.

Ideally, AR should be a bit smarter here, since right now it tries to cast the Arel::Nodes::TableAlias instance to a string. I'd assume they'd argue Arel is a private API and that this isn't a bug, so I opened the issue here first.

Here's a quick script reproducing the ActiveRecord::StatementInvalid error:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "sqlite3"

  gem "rails"
  gem "active_record_union"
end

require "rails"
require "active_record"
require "active_record_union"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :accounts, force: true do |t|
    t.timestamps
  end

  create_table :users, force: true do |t|
    t.references :account
    t.string :role
    t.timestamps
  end
end

class Account < ActiveRecord::Base; end
class User < ActiveRecord::Base
  belongs_to :account

  scope :foos, -> { where(role: 'foo') }
  scope :bars, -> { where(role: 'bar') }
end

require "minitest/autorun"
require "rack/test"

class AssociationReferencesTest < Minitest::Test
  def test_union_aggregate_without_join
    account = Account.create!
    user_1  = User.create!(account:, role: 'foo')
    user_2  = User.create!(account:, role: 'bar')
    user_3  = User.create!(account:, role: 'bar')

    union = User.foos.union(
      User.bars,
    )

    assert_not_raised ActiveRecord::StatementInvalid do
      assert_equal union.maximum(:updated_at), user_3.updated_at
    end
  end

  def test_union_aggregate_with_join
    account = Account.create!
    user_1  = User.create!(account:, role: 'foo')
    user_2  = User.create!(account:, role: 'bar')
    user_3  = User.create!(account:, role: 'bar')

    union = User.foos.union(
      User.bars,
    )

    assert_not_raised ActiveRecord::StatementInvalid do
      assert_equal union.joins(:account).maximum(:updated_at), user_3.updated_at
    end
  end

  private

  def assert_not_raised(exception_class, failure_message = nil)
    yield
  rescue => e
    if e.is_a?(exception_class)
      flunk(failure_message || "An exception was not expected but was raised: #{e}")
    else
      raise e
    end
  end
end

Unfortunately, in addition to one-off aggregations, this also breaks stale? and fresh_when caching/etag controller methods, since they use maximum(:updated_at) to calculate the cache's last modified timestamp.

The only workaround I could come up with is explicitly qualifying the column name:

union.maximum(:"#{union.table_name}.updated_at")

Do you think this is worth fixing?

ezekg commented 1 day ago

To make matters more complex, you cannot quality the column if the relation is already loaded:

users = User.foos.union(User.bars)
users.load

# raises ActiveRecord::UnmodifiableRelation:
users.maximum('users.updated_at')

# raises ActiveRecord::StatementInvalid:
users.maximum(:updated_at)

# workaround:
users.collect(&:updated_at).max

I'll open up a PR soon.