Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.3k stars 369 forks source link

Rebuild is not resumable #510

Open m5rk opened 1 year ago

m5rk commented 1 year ago

We'd like to use this to backfill a relatively large set of documents. But because we leverage conditional and additional attributes, the rebuild loops over every record. If the corresponding sidekiq job fails, on retry, it has to start all over. Is that a known issue?

m5rk commented 1 year ago

I think this gem could provide more guidance about how to backfill / rebuild a large set of search documents.

We handled this by implementing the class method pg_search looks for in each of our models that we search, along with the corresponding searchable and without_search_documents scopes:

  def self.rebuild_pg_search_documents
    searchable.without_search_documents.find_each(&:update_pg_search_document)
  end

Our rebuild sidekiq job's perform uses the options to skip clean_up and skip transactional. We expect the job to fail and need to retry. If each time it retries, it cleans up (essentially truncating the table), it will never finish. We skip the transaction because It's not practical to transact such a massive job. Besides, given that we expect it to retry and resume where it left off, there's no practical value in trying to transact it. Personally, I think transactional: false should be the default.

  def perform(class_name)
    PgSearch::Multisearch.rebuild(
      class_name.classify.constantize, clean_up: false, transactional: false)
  end