Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.3k stars 369 forks source link

Allow for further filtering of subquery #511

Open m5rk opened 1 year ago

m5rk commented 1 year ago

Source:

https://github.com/dwillett/pg_search/commit/c6418b5d4c6a6566aff2c854ad39b11d794d29a6

From: https://github.com/Casecommons/pg_search/issues/292

The problem seems to be that the notion of additional_attributes is not really elaborated enough. I suspect in most cases, it's to facilitate segregating tenants, like here:

https://www.donnfelker.com/multitenant-pgsearch-how-to/

But if they wonder why they have terrible performance and look at the explain plan, they'll see why: The subquery ignores the opportunity to restrict the search to the tenant.

Zinggi commented 1 year ago

Wow, using you're fork + using the same filtering condition that is used outside the inner query, I get a speedup of ca. 10x for a troublesome search!

Amazing :sparkles:

nertzy commented 1 year ago

This is interesting! We need to get some spec coverage on this to make sure it works as intended. I'm curious to dig in more.

m5rk commented 1 year ago

This is interesting! We need to get some spec coverage on this to make sure it works as intended. I'm curious to dig in more.

I'm happy to dig in as well. The only thing I can think of is something along these lines:

context 'when a block is given' do
  it 'has access to the subquery' do
    # Examine the content of the explain plan or .to_sql
  end
end

Update: @nertzy See my latest commit, which adds coverage that illustrates how the block is used.

m5rk commented 1 year ago

Assuming the code and specs are good, it seems like we may also want to update the docs to mention this capability. But I didn't want to spend time doing that until I get some confirmation that this approach (all credit to @dwillett) is approved by the maintainers.

LuukvH commented 1 year ago

@m5rk, do you have any updates on this? We also use the pg_search gem in a multi-tenant environment. However, with the growing number of records across multiple tenants, we are experiencing performance issues due to the inability to scope the subquery.

m5rk commented 1 year ago

@m5rk, do you have any updates on this? We also use the pg_search gem in a multi-tenant environment. However, with the growing number of records across multiple tenants, we are experiencing performance issues due to the inability to scope the subquery.

No update on my end, sorry. I'm just waiting for the maintainers to give feedback or guidance on what's missing or to merge and then publish a new gem version. @nertzy, thoughts?

LuukvH commented 1 year ago

@nertzy Another option could be to remove the unscoped on the subquery which was added in version 1.1 it seems to remove unnecessary joins. Which in my opinion is a concern to be handled by the developer and not this gem. This would also allow this gem to work with tenant gems that uses default scopes out of the box.

Zinggi commented 11 months ago

Sorry for another "bumb" here, but is there something that's missing here for this to be merged? Maybe I could help out somehow?

The PR looks pretty good to me, and we've been using this branch in production since about half a year without issue. Without the ability to filter on the sub query, we couldn't use this gem at all for performance reasons...

m5rk commented 11 months ago

@nertzy Any thoughts? I added coverage as you requested. As I noted earlier, maybe the only remaining item is to update the documentation?

jsuchal commented 8 months ago

If anyone wants a drop-in fix. This worked for us.

Drop this monkey patch somewhere (e.g. initializers in rails)

It basically works by scoping the inner query by the scope that goes in before you call pg_search_all.

# frozen_string_literal: true

# Monkey patch that adds scope pushing down conditions to subselect
module PgSearch
  class ScopeOptions
    def apply(scope)
      scope = include_table_aliasing_for_rank(scope)
      rank_table_alias = scope.pg_search_rank_table_alias(include_counter: true)

      scope
        .joins(rank_join(rank_table_alias, scope))
        .order(Arel.sql("#{rank_table_alias}.rank DESC, #{order_within_rank}"))
        .extend(WithPgSearchRank)
        .extend(WithPgSearchHighlight[feature_for(:tsearch)])
    end

    def rank_join(rank_table_alias, scope)
      "INNER JOIN (#{subquery(scope).to_sql}) AS #{rank_table_alias} ON #{primary_key} = #{rank_table_alias}.pg_search_id"
    end

    def subquery(scope)
      scope
        .select("#{primary_key} AS pg_search_id")
        .select("#{rank} AS rank")
        .joins(subquery_join)
        .where(conditions)
        .limit(nil)
        .offset(nil)
    end
  end
end
m5rk commented 5 months ago

@nertzy Anything else I can do to help get this reviewed and merged?

Joe-Degler commented 3 months ago

First up - thanks for this PR! I was able to slightly modify it and implement a monkey patch into our project.

I'm questioning the usefulness of having the block on the invocation, as opposed to the definition.

In our multi-tenancy system, the active tenant is a globally accessible variable. So what we want is actually the ability to pass a block to the initial definition, not to the method that ends up calling filter_bar. That's actually how I expected this PR to work until I took a more detailed look. Having to pass it on every filter_bar call (unless you specifically wrap that method) seems like an easy way to forget it.

In this case, this is how I imagined it'd look:

  pg_search_scope :filter_bar,
                  against: :bar
                  } do |foos|
    foos.partitioned
  end

Maybe there's value in being able to pass a block in either the definition, as well as the actual caller of the filter_bar method?

m5rk commented 3 weeks ago

@nertzy or other maintainers, can you please clarify what else is needed in order to merge this?