Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
http://www.casebook.net
MIT License
1.36k stars 371 forks source link

ranked_by greatest similarity by column? #354

Open kluzny opened 7 years ago

kluzny commented 7 years ago

I am searching over 4 columns, with an exact match in one column, using trigram.

From looking at the sql generated, the rank ( used in the sql ORDER ) is being generated from a concat of all the columns, so:

(similarity((coalesce("products"."short_name"::text, '') || ' ' || coalesce("products"."name"::text, '') || ' ' || coalesce("products"."gmi_code"::text, '') || ' ' || coalesce("products"."currency    "::text, '')), 'EMD')) AS rank
....
ORDER BY pg_search.rank DESC, "products"."id" ASC

What this means for my use case, is that the other columns are polluting the results. If I search for foo and I have a foo and a bar foo baz it seems I don't get the search results in an order i'd expect.

In my use case, I'd like to find results for the best matching column:

(similarity(coalesce("products"."short_name"::text, ''), 'EMD')) AS srank,
(similarity(coalesce("products"."name"::text, ''), 'EMD')) AS nrank,
(similarity((coalesce("products"."short_name"::text, '') || ' ' || coalesce("products"."name"::text, '') || ' ' || coalesce("products"."gmi_code"::text, '') || ' ' || coalesce("products"."currency    "::text, '')), 'EMD')) AS rank
...
ORDER BY greatest(pg_search.srank, pg_search.nrank, pg_search.rank) DESC, "products"."id" ASC

This returns exact matches, and fuzzy matches by using the greatest rank

The following was a search for EMD, which over the entire document is not a good match, but has very good similarity over a particular column.

  name        | short_name |   rank    |  srank   | nrank
-------------------------+------------+-----------+----------+-------
  M2M OPk     | EMDOXYZ    |   0.09375 | 0.333333 |     0
  M2M PFC OPk | EMDUXYZ    | 0.0857143 | 0.333333 |     0
  M2M PFC Pk  | EMDTXYZ    | 0.0909091 | 0.333333 |     0
  M2M Pk      | EMDMXYZ    |       0.1 | 0.333333 |     0

I'm not certain if this is something I am doing wrong in my scope, or I can define a custom rank_by that can rank multiple items and choose a best fit.

I would be interested in adding this feature if it's not supported, but I don't know if my use case is really generally applicable and a good fit for this library.

I considered an API like

 pg_search_scope :search_full_text,
                     against: {
                     short_name: "A",
                     name: "D",
                     gmi_code: "D",
                     currency: "D"
                    },
                    using: {
                      tsearch: { prefix: true }
                   },
                   ranked_by: :similarity

but then i wanted something more flexible like

  ranked_by: { greatest: [:short_name, :name]}

greatest, could also be other functions, like least

I've considered just monkey patching the rank methods on :trigram, but If this feels like a good fit, I'd rather try to whip up a pr.

mgrsskls commented 9 months ago

Very old request, but I need this as well :) @kluzny Have you found a solution for that?