Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.3k stars 369 forks source link

Possibility of using different multisearchable fields for different tenants? #493

Open pauljacksonrodgers opened 1 year ago

pauljacksonrodgers commented 1 year ago

I have an application that supports an admin portal and a customer portal. We already use multisearch for the admin portal's search function, and would now like to expose a similar search to customers.

For security reasons, I want the multisearchable fields to be different. For instance, admins should be able to search by phone number, but customers should not.

I thought the easiest way of accomplishing this would be to simply have two different content and tsv_content fields, like content_admin and content_customer, and then specify which one to search. Is this something that pg_search supports? Or is there a better way of doing what I'm trying to do?

copiousfreetime commented 1 year ago

I have this exact problem at the moment too.

After a dive on the source code, it looks like the following things would need to change in able to support it:

  1. The ability to add in additional pg_search_scope items to the PgSearch::Document - this is so you can set against: [:content, :content_admin] for the new pg_search_scope
  2. Change PgSearch.multisearch_options to allow configuring additional things for the additional pg_search_scope - for instance adding in ts_vector columns and things
  3. Nice to have - The ability to rename the default Multisearch :content column in the generated migration, and in pg_search_document_attrs so we wouldn't be required to have a column namedcontent
  4. The ability to have some method similar to the searchable_text approach, but for the internal / private fields

This is my current setup that is working.

I'm not sure if this would be there recommended way of doing it, and would definitely appreciate feedback from @nertzy or anyone on this approach.

lib/extensions/pg_search/document.rb

This adds an additional internal_search search scope to the existing PgSearch::Document. pg_search is already setup to handle joining multiple ts_vector colums together so just the additional scope and column are needed, and the additional column only needs the private data, no need to index the public data a 2nd time.


require 'pg_search'
PgSearch::Document.pg_search_scope(:internal_search, lambda { |*args|
  {
    query: args.first,
    against: [:content, :internal_content],
    using: {
      tsearch: {
        prefix: true,
        negation: true,
        tsvector_column: ["content_tsvector", "internal_content_tsvector"]
      }
    }
  }
})

Migration to alter the existing pg_search_documents table

class AlterPgSearchDocumentsForPrivateContent < ActiveRecord::Migration[6.1]

  def change
    # Add the new columns
    add_column(:pg_search_documents, :internal_content, :text)
    add_column(:pg_search_documents, :internal_content_tsvector, :tsvector)

    # add the new index for the private index
    add_index :pg_search_documents, :internal_content_tsvector, using: 'gin'

    trigger_name = "pg_search_documents_tsearch_tr"

    reversible do |dir|
      dir.up do
        create_trigger(trigger_name, compatibility: 1)
          .on(:pg_search_documents)
          .before(:insert, :update)
          .for_each(:row) do
            <<~PLSQL
            new.content_tsvector :=  to_tsvector('simple', coalesce(new.content::text, ''));
            new.internal_content_tsvector :=  to_tsvector('simple', coalesce(new.internal_content::text, ''));
            PLSQL
          end

        execute <<~SQL
          CREATE INDEX pg_search_documents_on_internal_content ON pg_search_documents USING gin(coalesce(internal_content, ''::text) gin_trgm_ops);
          CREATE INDEX pg_search_documents_on_all_content ON pg_search_documents USING gin(coalesce(content || internal_content, ''::text) gin_trgm_ops);
          CREATE INDEX pg_search_documents_on_all_tsvectors ON pg_search_documents USING gin((content_tsvector || internal_content_tsvector));
        SQL
      end

      dir.down do
        create_trigger(trigger_name, compatibility: 1)
          .on(:pg_search_documents)
          .before(:insert, :update)
          .for_each(:row) do
            "new.content_tsvector :=  to_tsvector('simple', coalesce(new.content::text, ''))"
          end

        execute <<~SQL
          DROP INDEX pg_search_documents_on_internal_content;
          DROP INDEX pg_search_documents_on_all_content;
          DROP INDEX pg_search_documents_on_all_tsvectors;
        SQL
      end
    end
  end
end

that manual create index for (content_tsvector || internal_content_tsvector) is what gets postgresql to use the vector index on the merged columns data, if it wasn't there the internal search would be doing table scans. This might be a useful not to add to the tsvector index documentation if folsk are using multiple columns.

searchable concern

I use a model concern of Searchable for models to say what attributes / methods of theirs return data that is indexable.

A usage example

class ObservationReport < ApplicationRecord

#...

  include Searchable
  self.searchable_attributes =  [
      :route,
      :area,
      :description,
   ]

  self.internal_searchable_attributes = [
     :comments_admin
     :firstname
     :lastname
  ]
end

And then can search doing:

# public search
PgSearch::Document.search("searching")
ObservationReport.search("searching") # this is a helper in the concern

# internal search
PgSearch::Document.internal_search("internal")
ObservationReport.internal_search("internal")