Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.3k stars 369 forks source link

Prefix search returns suboptimal ranking #500

Open mak-dunkelziffer opened 1 year ago

mak-dunkelziffer commented 1 year ago

In one of my pg_search_scopes turning on prefix search yields very weird ranks. Apart from slightly adjusting the rank (e.g. by a factor of 0.5 for old items), I don't do anything tricky:

pg_search_scope :search,
  against: :text,
  using: {
    tsearch: {
      tsvector_column: 'search_tsvector',
      prefix: true,
      negation: true,
      dictionary: 'simple',
      normalization: 0,
    }
  },
  ranked_by: <<-SQL
    trunc(
      :tsearch * 1000000 *   
      // slight boosting of results according to certain flags or item age, but never more than by a combined factor of 8.
    )
  SQL

The query gives low ranks to obviously important items (20+ occurrences of the search term) and results in a weird distribution of ranks. I would (for search in general) expect some distribution where ranks between neighboring results differ by maybe 10% on average, but I get ranks like [1'000'000, 30'000, 500, 10, ...].

Obviously, with such huge gaps, any custom rank boosting will have no effect on the order of the results. But more importantly, I would understand such a clear result, if the best match would be on the top, but it isn't.

This huge spread of ranks only happens with prefix: true, any_word: false. For all other three combinations of these flags, the ranks have a saner distribution, are much closer to each other and the obvious best result is on top.

Is there any known problem with this combination? Is this possibly a bug or is there a logical reason, why this combination behaves differently than the others? Also, are there more advanced methods of debugging such a thing than simply displaying the rank in the output?

I would really like to keep the prefix search without messing up all of the ranks.