clear-code / redmine_full_text_search

Full text search for Redmine
MIT License
61 stars 24 forks source link

pgroonga: cannot display score for wiki #19

Closed okkez closed 7 years ago

okkez commented 7 years ago

pgroonga で wiki を全文検索の対象にしたときにスコアがゼロになる。 たぶん pgroonga の制限。orをandに変えるとスコアはゼロにならない。 :thought_balloon: 何回も同じこと調べてる気がする

select 
  pgroonga.score(c) 
from
  wiki_pages as p 
  join
  wiki_contents as c
    on p.id = c.page_id 
where 
  text @@ '全文検索' 
  or
  title @@ '全文検索';
kou commented 7 years ago

たぶん、制限だと思います。(pgroonga.scoreを評価する時にtext @@ '全文検索'を実行した結果もtitle @@ '全文検索`を実行した結果も両方PostgreSQLが解放してしまっているとスコアーを取得できないが、そうなってしまうケースな気がする。)

そのときのEXPLAINはどうなりますか?

okkez commented 7 years ago

or にしたときと and にしたときのそれぞれで explainexplain analyze を取りました。

redmine_development=# explain select
redmine_development-#   pgroonga.score(c)
redmine_development-# from
redmine_development-#   wiki_pages as p
redmine_development-#   join
redmine_development-#   wiki_contents as c
redmine_development-#     on p.id = c.page_id
redmine_development-# where
redmine_development-#   text @@ '全文検索'
redmine_development-#   or
redmine_development-#   title @@ '全文検索';
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Hash Join  (cost=32.14..409.25 rows=513 width=8)
   Hash Cond: (c.page_id = p.id)
   Join Filter: ((c.text @@ '全文検索'::text) OR (p.title @@ '全文検索'::character varying))
   ->  Seq Scan on wiki_contents c  (cost=0.00..113.84 rows=984 width=1581)
   ->  Hash  (cost=19.84..19.84 rows=984 width=28)
         ->  Seq Scan on wiki_pages p  (cost=0.00..19.84 rows=984 width=28)
(6 行)
redmine_development=# explain analyze select 
  pgroonga.score(c) 
from
  wiki_pages as p 
  join
  wiki_contents as c
    on p.id = c.page_id 
where 
  text @@ '全文検索' 
  or
  title @@ '全文検索';
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=32.14..409.25 rows=513 width=8) (actual time=191.416..385.097 rows=8 loops=1)
   Hash Cond: (c.page_id = p.id)
   Join Filter: ((c.text @@ '全文検索'::text) OR (p.title @@ '全文検索'::character varying))
   Rows Removed by Join Filter: 976
   ->  Seq Scan on wiki_contents c  (cost=0.00..113.84 rows=984 width=1581) (actual time=0.018..21.309 rows=984 loops=1)
   ->  Hash  (cost=19.84..19.84 rows=984 width=28) (actual time=0.898..0.898 rows=984 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 66kB
         ->  Seq Scan on wiki_pages p  (cost=0.00..19.84 rows=984 width=28) (actual time=0.010..0.607 rows=984 loops=1)
 Planning time: 1.581 ms
 Execution time: 385.131 ms
redmine_development=# explain select 
  pgroonga.score(c) 
from
  wiki_pages as p 
  join
  wiki_contents as c
    on p.id = c.page_id 
where 
  text @@ '全文検索' 
  and
  title @@ '全文検索';
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..94.03 rows=1 width=8)
   ->  Bitmap Heap Scan on wiki_contents c  (cost=0.00..90.01 rows=1 width=817)
         Recheck Cond: (text @@ '全文検索'::text)
         ->  Bitmap Index Scan on index_wiki_contents_on_id_and_text  (cost=0.00..0.00 rows=42 width=0)
               Index Cond: (text @@ '全文検索'::text)
   ->  Index Only Scan using index_wiki_pages_pgroonga on wiki_pages p  (cost=0.00..4.01 rows=1 width=4)
         Index Cond: ((id = c.page_id) AND (title @@ '全文検索'::character varying))
(7 行)
redmine_development=# explain analyze select 
  pgroonga.score(c) 
from
  wiki_pages as p 
  join
  wiki_contents as c
    on p.id = c.page_id 
where 
  text @@ '全文検索' 
  and
  title @@ '全文検索';
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..94.03 rows=1 width=8) (actual time=13.370..26.803 rows=5 loops=1)
   ->  Bitmap Heap Scan on wiki_contents c  (cost=0.00..90.01 rows=1 width=817) (actual time=0.784..2.738 rows=50 loops=1)
         Recheck Cond: (text @@ '全文検索'::text)
         Heap Blocks: exact=39
         ->  Bitmap Index Scan on index_wiki_contents_on_id_and_text  (cost=0.00..0.00 rows=42 width=0) (actual time=0.763..0.763 rows=50 loops=1)
               Index Cond: (text @@ '全文検索'::text)
   ->  Index Only Scan using index_wiki_pages_pgroonga on wiki_pages p  (cost=0.00..4.01 rows=1 width=4) (actual time=0.415..0.415 rows=0 loops=50)
         Index Cond: ((id = c.page_id) AND (title @@ '全文検索'::character varying))
         Heap Fetches: 5
 Planning time: 1.225 ms
 Execution time: 28.130 ms
(11 行)
kou commented 7 years ago

ORのときはシーケンシャルスキャンになっているからですね。。。 インデックスを使って検索したときに一緒にスコアーを計算するのでシーケンシャルスキャンだとスコアーを返せないんですよねぇ。

okkez commented 7 years ago

v0.5.0 で pgroonga.command() を使うようにしたので Wiki でもスコアを得られるようになった。