Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.34k stars 371 forks source link

Some words in spanish dictionary don't even match themselves #463

Closed silva96 closed 3 years ago

silva96 commented 3 years ago

PostgreSQL version: 13 Ruby version: 2.7.2 gem version: 2.3.5

context "with the spanish dictionary" do
        before do
          ModelWithPgSearch.pg_search_scope :search_content_with_spanish,
                                            against: :content,
                                            using: {
                                              tsearch: { dictionary: :spanish }
                                            }
        end

        it "returns rows that match the query when stemmed by the spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "saltar"),
                      ModelWithPgSearch.create!(content: "salté"),
                      ModelWithPgSearch.create!(content: "saltando")]

          results = ModelWithPgSearch.search_content_with_spanish("saltar")
          expect(results).to match_array(included)
        end

        it "returns rows that match the query when stemmed by the spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "pedir"),
                      ModelWithPgSearch.create!(content: "pedido")]

          results = ModelWithPgSearch.search_content_with_spanish("pedido")
          expect(results).to match_array(included)
        end

        it "returns rows that match the exact query spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "sentir"),
                      ModelWithPgSearch.create!(content: "sentido")]

          results = ModelWithPgSearch.search_content_with_spanish("sentido")
          expect(results).to match_array(included)
        end
      end

The last test fails, but it's weird, because it is not only not stemming, but is not even matching the exact query.

From the examples,

SELECT * FROM ts_debug('spanish', 'pedidos');

image

SELECT * FROM ts_debug('spanish', 'sentidos');

image

silva96 commented 3 years ago

added a stackoverflow question because this looks like a postgresql bug

https://stackoverflow.com/questions/66661739/postgresql-full-text-search-using-spanish-dictionary-to-tsquery-does-not-work-in

nertzy commented 3 years ago

Closing as there is nothing we can do on the pg_search gem side to address this issue.

Please look into the stop word issue mentioned in this Stack Overflow comment.

Good luck!

silva96 commented 3 years ago

Just to add my two cents, here's my workaround:

Having this configuration for search, I dynamically select the dictionary with that service class:

pg_search_scope :full_text_search, lambda { |query, locale|
    {
      against: {
        cached_tag_list: 'A',
        title: 'B',
        plain_description: 'C'
      },
      using: {
        tsearch: { dictionary: DictionarySelector.call(locale, query) }
      },
      ignoring: :accents,
      query: query
    }
  }

inside it, I define each word I don't want to miss

  EXCLUDED_STOPWORDS = {
    es: %w[sentido estado]
  }.freeze

then I decide wether I want the language specific or the simple dict

return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }

here the full class:

# frozen_string_literal: true

class DictionarySelector < ApplicationService
  SIMPLE_DICTIONARY = 'simple'
  LANGUAGES_MAP = {
    ar: 'arabic', da: 'danish', nl: 'dutch', en: 'english', fi: 'finnish', fr: 'french', de: 'german',
    hu: 'hungarian', id: 'indonesian', ga: 'irish', it: 'italian', lt: 'lithuanian', ne: 'nepali',
    no: 'norwegian', pt: 'portuguese', ro: 'romanian', es: 'spanish', sv: 'swedish', ta: 'tamil', tr: 'turkish'
  }.freeze # List_of_ISO_639-1_codes

  EXCLUDED_STOPWORDS = {
    es: %w[sentido estado]
  }.freeze

  def initialize(locale, query)
    @locale = locale&.to_sym
    @query = query
  end

  private

  attr_reader :locale, :query

  def perform
    return SIMPLE_DICTIONARY unless locale
    return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }

    LANGUAGES_MAP[locale] || SIMPLE_DICTIONARY
  end
end