cuba-platform / fts

Full-Text Search Addon
https://www.cuba-platform.com/
Apache License 2.0
4 stars 1 forks source link

Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10

Closed haulmont-git closed 6 years ago

haulmont-git commented 7 years ago

Shortly In Sherlock we have > 500000 of entities (address database) for FTS indexing. When I try to reindex the whole queue with FtsManager.asyncReindexAll on local machine - it does not work. One of iterations hangs somewhere in the end of the reindexing. The problem is the same in cuba 5.6.

How to reproduce 1) Open refapp master

2) Enable scheduling:

cuba.schedulingActive = true

3) Create sample data

insert into sec_user (id, version, create_ts, created_by, update_ts, login, login_lc, name, email, group_id)
select newid(), 1, now(), 'admin', now(), 
'us' ||  t, 'us' ||  t, 
'Name ' || t, 'user' || t || '@example.com', '0fa2b1a5-1d68-4d69-9fbd-dff348347f93'
from generate_series(1,500000) t

4) Create scheduled task. cuba_FtsManager - reindexNextBatch. Period = 5, Timeout = 3600. Activate it.

5) Jmx Console -> FtsManager -> app-core.fts:type=FtsManager -> asyncReindexAll

6) Activate scheduled task. Go to Scheduled tasks -> our task -> Executions and wait.

7) One of iterations hangs when number of elements in the fts queue reaches 420000. See screenshot.

Check current queries:

refapp_6=# select state, (now() - query_start) as "running for", query from pg_stat_activity where state <> 'idle';
 active | 00:21:10.972806 | SELECT t0.ID AS a1 FROM SEC_USER t0 WHERE (t0.ID NOT IN (SELECT t1.ENTITY_ID FROM SYS_FTS_QUEUE t1 WHERE (t1.ENTITY_NAME = $1)) AND (t0.DELETE_TS IS NULL)) LIMIT $2 OFFSET $3

Look at query execution plan.

refapp_6=# explain
refapp_6-# SELECT t0.ID AS a1 FROM SEC_USER t0 WHERE (t0.ID NOT IN (SELECT t1.ENTITY_ID FROM SYS_FTS_QUEUE t1 
refapp_6(# WHERE (t1.ENTITY_NAME = 'sec$User')) AND (t0.DELETE_TS IS NULL)) LIMIT 5000 OFFSET 0
refapp_6-# ;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Limit  (cost=0.00..84585317.12 rows=5000 width=16)
   ->  Seq Scan on sec_user t0  (cost=0.00..4228453836.80 rows=249952 width=16)
         Filter: ((delete_ts IS NULL) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Materialize  (cost=0.00..15867.00 rows=420000 width=16)
                 ->  Seq Scan on sys_fts_queue t1  (cost=0.00..11716.00 rows=420000 width=16)
                       Filter: ((entity_name)::text = 'sec$User'::text)
(7 rows)

The cost is... terrible. Looks like it is a Postgres query planner bug?

Just for curiosity, try to rewrite SQL query to use a left join:

refapp_6=# explain analyze
refapp_6-# SELECT t0.ID AS a1 FROM SEC_USER t0
refapp_6-# left join sys_fts_queue t1 on (t1.ENTITY_NAME = 'sec$User' and t0.id = t1.entity_id)
refapp_6-# where t1.id is null and t0.delete_ts is null
refapp_6-# limit 5000
refapp_6-# ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19838.00..48269.68 rows=1 width=16) (actual time=462.856..467.726 rows=5000 loops=1)
   ->  Hash Left Join  (cost=19838.00..48269.68 rows=1 width=16) (actual time=462.854..467.376 rows=5000 loops=1)
         Hash Cond: (t0.id = t1.entity_id)
         Filter: (t1.id IS NULL)
         Rows Removed by Filter: 209373
         ->  Seq Scan on sec_user t0  (cost=0.00..14603.04 rows=499904 width=16) (actual time=0.007..80.960 rows=430064 loops=1)
               Filter: (delete_ts IS NULL)
         ->  Hash  (cost=11716.00..11716.00 rows=420000 width=32) (actual time=218.475..218.475 rows=420000 loops=1)
               Buckets: 262144  Batches: 2  Memory Usage: 15134kB
               ->  Seq Scan on sys_fts_queue t1  (cost=0.00..11716.00 rows=420000 width=32) (actual time=0.012..107.442 rows=420000 loops=1)
                     Filter: ((entity_name)::text = 'sec$User'::text)
 Planning time: 0.207 ms
 Execution time: 471.037 ms
(13 rows)

The plan is much better, and the query executes quickly.

DB parameters

budarov@budarov:~$ dpkg -l | grep postgresql
ii  postgresql-9.5                              9.5.7-1.pgdg16.04+1                           amd64        object-relational SQL database, version 9.5 server

postgres config:

shared_buffers = 1200MB         # min 128kB
temp_buffers = 48MB         # min 800kB
work_mem = 16MB             # min 64kB
maintenance_work_mem = 512MB        # min 1MB
#effective_cache_size = 4GB

Original issue: https://youtrack.haulmont.com/issue/PL-9480

haulmont-git commented 6 years ago

Git changesets by subbotin:

2f03e8ec in release_6_10 - Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10

haulmont-git commented 6 years ago

Git changesets by subbotin:

9469958d in master - Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10

haulmont-git commented 6 years ago

Git changesets by subbotin:

78c1fa6d in master - Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10

haulmont-git commented 6 years ago

Git changesets by subbotin:

0da845d6 in release_6_9 - Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10

haulmont-git commented 6 years ago

Git changesets by subbotin:

cbc1dd6a in release_6_8 - Postgres hangs during FtsManager - asyncReindexAll for a large number of entities #10