In one of my pg_search_scopes turning on prefix search yields very weird ranks. Apart from slightly adjusting the rank (e.g. by a factor of 0.5 for old items), I don't do anything tricky:
pg_search_scope :search,
against: :text,
using: {
tsearch: {
tsvector_column: 'search_tsvector',
prefix: true,
negation: true,
dictionary: 'simple',
normalization: 0,
}
},
ranked_by: <<-SQL
trunc(
:tsearch * 1000000 *
// slight boosting of results according to certain flags or item age, but never more than by a combined factor of 8.
)
SQL
The query gives low ranks to obviously important items (20+ occurrences of the search term) and results in a weird distribution of ranks. I would (for search in general) expect some distribution where ranks between neighboring results differ by maybe 10% on average, but I get ranks like [1'000'000, 30'000, 500, 10, ...].
Obviously, with such huge gaps, any custom rank boosting will have no effect on the order of the results. But more importantly, I would understand such a clear result, if the best match would be on the top, but it isn't.
This huge spread of ranks only happens with prefix: true, any_word: false. For all other three combinations of these flags, the ranks have a saner distribution, are much closer to each other and the obvious best result is on top.
Is there any known problem with this combination? Is this possibly a bug or is there a logical reason, why this combination behaves differently than the others? Also, are there more advanced methods of debugging such a thing than simply displaying the rank in the output?
I would really like to keep the prefix search without messing up all of the ranks.
In one of my pg_search_scopes turning on prefix search yields very weird ranks. Apart from slightly adjusting the rank (e.g. by a factor of 0.5 for old items), I don't do anything tricky:
The query gives low ranks to obviously important items (20+ occurrences of the search term) and results in a weird distribution of ranks. I would (for search in general) expect some distribution where ranks between neighboring results differ by maybe 10% on average, but I get ranks like [1'000'000, 30'000, 500, 10, ...].
Obviously, with such huge gaps, any custom rank boosting will have no effect on the order of the results. But more importantly, I would understand such a clear result, if the best match would be on the top, but it isn't.
This huge spread of ranks only happens with
prefix: true, any_word: false
. For all other three combinations of these flags, the ranks have a saner distribution, are much closer to each other and the obvious best result is on top.Is there any known problem with this combination? Is this possibly a bug or is there a logical reason, why this combination behaves differently than the others? Also, are there more advanced methods of debugging such a thing than simply displaying the rank in the output?
I would really like to keep the prefix search without messing up all of the ranks.