Open pik opened 9 years ago
Ruby
include?
will run at ~100ms on pre-fetched objects once they are loaded into memory (~70ms in python).
Though when you include the query time, it’s ~1250ms vs. PgSearch’s query’s ~1680ms. In which case it’s not really too much faster (¾ time, or 1.34x faster), and with that extra time you gain lower memory consumption on the client, less network transfer, and intelligent search rank. Sure if you’ve actually already loaded the entire search space into memory, then yea, it’s going to be faster and the first two cons (memory and network) are mute; but you still give up the rank.
That said…
To see if this can be made faster (perhaps 'pg_search' is building suboptimal indexes or queries) - I've tried re-indexing with both gin and gist, as well as building separate indexes for a nested select
The PgSearch query looks like it’s using the search indexes just fine, the problem may be the condition ("variants"."status" != 'archived')
, which doesn’t appear to be indexed. What happens if you remove that from your query? Additionally, the INNER JOIN "products" ON "products"."id" = "variants"."product_id"
would also add some time to the query, which doesn’t seem to be needed for the search.
the later is a naive O(n**2) lookup.
It’s not, it’s O(n*m) where n = number of records, m = average field length.
Unless I’m missing something (and I might be, as it’s quite late), I’m inclined to think there isn’t a problem here. Postgres’s full text search is not exactly fast, and FWIW Case Commons migrated away to Elasticsearch primarily for performance, and the difference is substantial with our medium+ datasets.
@amarshall Sorry I neglected to include the explain analyze for the stripped down version of the above query:
EXPLAIN ANALYZE SELECT "variants".id, "variants".search from "variants"
WHERE account_id = $1
AND ((("variants"."search") @@ (to_tsquery('simple', ''' ' || 'black' || ' ''' || ':*'))));
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on variants (cost=383.43..98353.11 rows=28042 width=167) (actual time=336.732..1085.490 rows=128052 loops=1)
Recheck Cond: ((account_id = $1) AND (search @@ '''black'':*'::tsquery))
Rows Removed by Index Recheck: 501447
Heap Blocks: exact=36190 lossy=53808
-> Bitmap Index Scan on index_variants_on_search_and_account_id (cost=0.00..376.42 rows=28042 width=0) (actual time=326.986..326.986 rows=223097 loops=1)
Index Cond: ((account_id = $1) AND (search @@ '''black'':*'::tsquery))
Planning time: 0.616 ms
Execution time: 1093.949 ms
(8 rows)
That will give about the same performance as the naive ruby-fetch (give or take some variance). The problem is that I really expect Postgres Text Search with pre-built indexes to be faster than fetching objects and doing naive string .include?
-- e.g. considering that with a web-server the entire data-set for an account can be cached into memory and maybe re-used 50-100 times before it's expired.
How are things working for you with Elasticsearch? We are considering this as an alternative as well -- but I wanted to be sure we weren't missing something essential that would enable us to greatly improve the Postgres performance. Are these numbers within +/- 25% of what you would expect?
Hello - apologies for the generic issue title, hopefully I can edit it to be more specific once I'm a little more clear on the source of this. We are using the
pg_search
gem for an auto-complete and I'm somewhat puzzled by it's performance (and this may be Postgres rather than pg_search related, more on that below).Here is a query generated by the pg_search gem (and explain analyze output below):
The execution time is absolutely staggering. Here is what this looks like if we pre-fetch the objects in native ruby:
In other-words Ruby .include? will run at ~100ms on pre-fetched objects once they are loaded into memory (~70ms in python). Where as the native SQL never goes under 1000ms. To see if this can be made faster (perhaps 'pg_search' is building suboptimal indexes or queries) - I've tried re-indexing with both gin and gist, as well as building separate indexes for a nested select e.g.:
None of these appear to provide any difference that's close to bridging the gap between Postgres query run-time and native Ruby
.include?
even though the later is a naive O(n**2) lookup.