Open monfresh opened 9 years ago
In 1.0.0, I moved the search relevance calculation into a subquery, which I think is what triggered your bug. From this point forward, I plan on following Semantic Versioning 2.0.0, so a change to the MAJOR version represents a breaking change, as in the change from 0.x to 1.x.
It looks like we should add test coverage for the SELECT DISTINCT
case to pg_search. I think that is what isn't currently working with the subquery. For now, I think a quick workaround might be to chain .select("pg_search.rank", "locations.id")
onto the end of that particular call, or to add those two fields to any existing .select()
calls you may have.
Please let me know if that works for you. In the meantime I will try to think of a more generic solution.
I'm going to change the title of this issue so to reflect that it's about SELECT DISTINCT
.
Thanks for the bug report!
I'm also having an issue with the new subquery. We are using activerecord_any_of to join pg_search scopes with other conditions via OR instead of AND and the subquery breaks the OR logic.
Is there any way to retain the old non-subquery method add an option to use it when desired?
@aaronchi currently, no. Actually if you could open a separate issue for what you're seeing that would be helpful too so that we can try to add it to the test coverage.
I still haven't found the holy grail of a way to express the complexity of the pg_search query in a way that doesn't clobber other chained scopes. The subquery solution fixes a lot of the longstanding bugs, but adds some new ones as well, apparently.
I'd prefer not to add back the old query style if possible, but I'm still open to the idea.
@nertzy Thanks for the details and the suggestion. Adding .select("pg_search.rank", "locations.id")
to the end of my scope that uses pg_search only fixed a few search queries, but introduced a new error in other queries.
The search function in my API allows various parameters to be passed in, and the errors occur when the keyword parameter (which uses pg_search) is used alone or combined with other parameters.
Searches that only included the keyword
parameter were the ones that resulted in the PG::InvalidColumnReference
error. When I add your suggested fix to the keyword
scope, like this:
scope :keyword, ->(keyword) { keyword_search(keyword).select("pg_search.rank", "locations.id") }
those same tests now fail with a different error:
Failure/Error: get api_search_index_url(keyword: 'food', subdomain: ENV['API_SUBDOMAIN'])
ActiveModel::MissingAttributeError:
missing attribute: organization_id
I also discovered another error caused by pg_search 1.0 that isn't resolved by your suggested fix.
For a search that includes keyword
and category
parameters, I get the following error with or without your suggested fix:
Failure/Error: get api_search_index_url(category: 'Jobs', keyword: 'jobs', subdomain: ENV['API_SUBDOMAIN'])
ActiveRecord::StatementInvalid:
PG::UndefinedTable: ERROR: invalid reference to FROM-clause entry for table "categories"
LINE 1: ...N "locations"."id" = pg_search.pg_search_id WHERE "categorie...
^
HINT: Perhaps you meant to reference the table alias "categories_services_2".
: SELECT DISTINCT MAX("locations"."updated_at") FROM "locations" INNER JOIN "services"
"services_locations" ON "services_locations"."location_id" = "locations"."id" INNER JOIN
"categories_services" "services_categories_services_join" ON
"services_categories_services_join"."service_id" = "services_locations"."id" INNER JOIN "categories"
"categories_services_2" ON "categories_services_2"."id" =
"services_categories_services_join"."category_id" INNER JOIN (SELECT "locations"."id" AS
pg_search_id, (ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'jobs' || ' ''' || ':*')), 0)) AS rank
FROM "locations" INNER JOIN "services" ON "services"."location_id" = "locations"."id" INNER JOIN
"categories_services" ON "categories_services"."service_id" = "services"."id" INNER JOIN "categories"
ON "categories"."id" = "categories_services"."category_id" WHERE "categories"."name" = 'Jobs' AND
((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'jobs' || ' ''' || ':*'))))) pg_search ON
"locations"."id" = pg_search.pg_search_id WHERE "categories"."name" = 'Jobs' LIMIT 30 OFFSET 0
Note that I don't have a table called categories_services_2
. That is definitely something new pg_search is doing because in version 0.7.9, the query does not try to use categories_services_2
:
SELECT DISTINCT MAX("locations"."updated_at") FROM "locations" INNER JOIN "services" ON
"services"."location_id" = "locations"."id" INNER JOIN "categories_services" ON
"categories_services"."service_id" = "services"."id" INNER JOIN "categories" ON "categories"."id" =
"categories_services"."category_id" WHERE "categories"."name" = 'Jobs' AND ((("locations"."tsv_body")
@@ (to_tsquery('english', ''' ' || 'jobs' || ' ''' || ':*')))) LIMIT 30 OFFSET 0
Note how much longer the query is in 1.0 versus 0.7.9.
@monfresh could you try out the latest master? The fix for #235 might affect this use case as well.
I am getting the exact same mistake as @monfresh when performing a chained SELECT DISTINCT
query, even after updating pg_search
to version 1.0.2. The workaround suggested by @nertzy worked, but it would still be good to investigate the error and find a generic solution.
The chaining query that results in the error is Product.find_by_store(store).search_by_name(name)
, and the workaround was chaining .select("pg_search.rank", "products.*")
to it. Here are the details of the model and error message:
class Product < ActiveRecord::Base
include PgSearch
has_many :area_category_products
has_many :area_categories, through: :area_category_products
pg_search_scope :search_by_name, against: :name,
using: { tsearch: { prefix: true,
any_word: true } },
ignoring: :accents
def self.find_by_store(store)
Product.joins(area_categories: :area).where(areas: { store_id: store.id })
.uniq
end
ActiveRecord::StatementInvalid:
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ..._search_id WHERE "areas"."store_id" = 1 ORDER BY pg_search....
^
: SELECT DISTINCT "products".* FROM "products" INNER JOIN "area_category_products" ON "area_category_products"."product_id" = "products"."id" INNER JOIN "area_categories" ON "area_categories"."id" = "area_category_products"."area_category_id" INNER JOIN "areas" ON "areas"."id" = "area_categories"."area_id" INNER JOIN (SELECT "products"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', unaccent(coalesce("products"."name"::text, '')))), (to_tsquery('simple', ''' ' || unaccent('prod') || ' ''' || ':*')), 0)) AS rank FROM "products" WHERE (((to_tsvector('simple', unaccent(coalesce("products"."name"::text, '')))) @@ (to_tsquery('simple', ''' ' || unaccent('prod') || ' ''' || ':*'))))) pg_search ON "products"."id" = pg_search.pg_search_id WHERE "areas"."store_id" = 1 ORDER BY pg_search.rank DESC, "products"."id" ASC
@nertzy I wanted to give you an update on this issue from the current master branch. If I don't change my current code, I get a similar error as the one I originally reported, but it's now referencing pg_search_locations.pg_search_id
whereas before it was pg_search.pg_search_id
:
Failure/Error: get api_search_index_url(keyword: 'jobs', per_page: 1, subdomain: ENV['API_SUBDOMAIN'])
ActiveRecord::StatementInvalid:
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ..."id" = pg_search_locations.pg_search_id ORDER BY pg_search_...
^
: SELECT DISTINCT "locations".* FROM "locations" INNER JOIN (SELECT "locations"."id" AS pg_search_id, (ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'jobs' || ' ''' || ':*')), 0)) AS rank FROM "locations" WHERE ((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'jobs' || ' ''' || ':*'))))) pg_search_locations ON "locations"."id" = pg_search_locations.pg_search_id ORDER BY pg_search_locations.rank DESC, "locations"."id" ASC LIMIT 1 OFFSET 0
This means that I need to change your original workaround to .select("pg_search_locations.rank", "locations.id")
It seems like the workaround is now specific to the model where the pg_search_scope
is defined.
When I do that, I get the same error I reported in my previous update:
Failure/Error: get api_search_index_url(keyword: 'jobs', per_page: 1, subdomain: ENV['API_SUBDOMAIN'])
ActiveModel::MissingAttributeError:
missing attribute: organization_id
If I select all of the columns in the Locations table, as suggested by @LauraAntuna, then all my tests pass:
.select("pg_search_locations.rank", "locations.*")
Another thing I noticed is that the SQL is now longer. In version 0.7.9 of the gem, the SQL generated was:
SELECT DISTINCT "locations".*, ((ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'job' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "locations" WHERE ((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'job' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "locations"."id" ASC
Now, the SQL adds an INNER JOIN when there wasn't one before:
SELECT DISTINCT pg_search_locations.rank, locations.* FROM "locations" INNER JOIN (SELECT "locations"."id" AS pg_search_id, (ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'job' || ' ''' || ':*')), 0)) AS rank FROM "locations" WHERE ((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'job' || ' ''' || ':*'))))) pg_search_locations ON "locations"."id" = pg_search_locations.pg_search_id ORDER BY pg_search_locations.rank DESC, "locations"."id" ASC
I haven't tested it yet on a large dataset to see if there's a performance impact.
I have the same issue using 1.0.4, and @LauraAntuna's solution works for me.
For me, the solution was to add the with_pg_search_rank
scope. I wonder if it's possible to detect this automatically when distinct
is present.
With 1.0.4 I had to add pg_search_series.rank
to SELECT
and switching to 1.0.5 needs pg_search_9cda38ea3a2bd78cbaecd4.rank
, what is this stamp 9cda38ea3a2bd78cbaecd4, does it change between queries?
@mateuszpalyz, just use the with_pg_search_rank
scope, like this: Post.search_by_title('some title').with_pg_search_rank.uniq
. See scope source here (v.1.0.5) if interested, it just adds proper selects as suggested above. Adding it automatically is definately a good idea!
I'm using version 1.0.6 of this gem and I get: "undefined method `with_pg_search_rank' for #\u003cCompanyFriendshipLocation::ActiveRecord_Relation:0x007f906e728160\u003e"
+1 .uniq kills the query
@ikido's solution works perfectly, it should be documented or something.
This is still an issue in 2.1.4. Using .uniq
is a solid workaround, but it's not a true solution because there is a performance loss when moving the dedupe workload from Postgres to Ruby. Chaining .with_pg_search_rank.distinct
fixed everything else, but broke when adding .count
to the end of the query:
ActiveRecord::StatementInvalid:
PG::SyntaxError: ERROR: syntax error at or near "AS"
LINE 1: ...tegories.*, pg_search_a6216ea03e578f212dd604.rank AS pg_sear...
^
: SELECT COUNT(DISTINCT categories.*, pg_search_a6216ea03e578f212dd604.rank AS pg_search_rank) FROM "categories" INNER JOIN "expert_categories" ON "expert_categories"."category_id" = "categories"."id" INNER JOIN "experts" ON "experts"."id" = "expert_categories"."expert_id" INNER JOIN "cities" ON "cities"."id" = "experts"."city_id" INNER JOIN (SELECT "categories"."id" AS pg_search_id, (ts_rank((setweight(to_tsvector('english', coalesce("categories"."name"::text, '')), 'A') || setweight(to_tsvector('english', coalesce("categories"."search_synonyms"::text, '')), 'B')), (to_tsquery('english', ''' ' || 'painting' || ' ''' || ':*')), 0)) AS rank FROM "categories" WHERE (((setweight(to_tsvector('english', coalesce("categories"."name"::text, '')), 'A') || setweight(to_tsvector('english', coalesce("categories"."search_synonyms"::text, '')), 'B')) @@ (to_tsquery('english', ''' ' || 'painting' || ' ''' || ':*'))))) AS pg_search_a6216ea03e578f212dd604 ON "categories"."id" = pg_search_a6216ea03e578f212dd604.pg_search_id WHERE "experts"."city_id" = $1 AND "experts"."status" = $2
We could use Ruby to count
, but we'll continue to live with the .uniq
workaround in fear that this would be a footgun to developers using scope for other purposes.
@ikido's solution worked fine!
Another workaround that works well with distinct
, count
and pg_search
is to reorder('')
.
It looks like the 'order by rank' part is problematic, so if ordering by pg_search's rank isn't required, then this workaround would be a great fit.
Example, Post.search_by_title('some title').reorder('').distinct.count
thx @HusseinElMotayam
Post.search_by_title('some title').reorder('').distinct
works fine and doesn't kill query like .uniq
@HusseinElMotayam solution fixed it for me. many thanks
Still seeing the same issue if trying to chain multiple search scopes together. Example
Post.search_by_title('some title').search_by_body('some text').with_pg_search_rank.distinct
will produce
#<ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
Looking into a fix, but haven't came up with anything yet. Thanks for any help!
I just upgraded to 1.0.0 from 0.7.9 and I'm now getting a bunch of failures for tests that perform a pg_search query. I looked at the Changelog and the Readme, but I don't see any instructions for upgrading from 0.7.9 to 1.0.0 or any mention of breaking changes, so I assumed everything should still work.
Here is the specific error I am getting:
And here is the query: https://github.com/codeforamerica/ohana-api/blob/master/app/models/concerns/search.rb#L26-35
Thoughts?