FTS indexing spends most of time fetching entitities one by one from database

Environment

Platform version: 6.10 snapshot

Description of the bug or enhancement

FTS indexing is much slower than it could be. It spends most of time not to write lucene index, but to load entities and lazy-load entity graph from database.

Open refapp 6.10

Create 100 000 of User instances by invoking this script:

insert into sec_user (id, version, create_ts, created_by, update_ts, login, login_lc, name, email, group_id, position_)
select newid(), 1, now(), 'admin', now(), 
'us' ||  t, 'us' ||  t, 
'Name ' || t, 'user' || t || '@example.com', 
'0fa2b1a5-1d68-4d69-9fbd-dff348347f93', 'Manager'
from generate_series(1,100000) t

Launch JVisualVM, start CPU sampling.
Invoke JMX FtsManager -> asyncReindexEntity "sec$User"
Wait until indexing is finished and analyze CPU snapshot.

CPU time distribution:

FtsManager.processQueue() - 14160 ms
lazy-loading User.group field from LuceneIndexerBean.addLinkedPropertyEx - 3136 ms
lazy-loading User collections (roles, substitutions) from LuceneIndexerBean.addLinkedPropertyEx - 2602 ms
loading every entity to be indexed one by one by using em.find() - 4351 ms

So time share to load indexed entities from DB is: (3136 + 2602 + 4351) / 14160 * 100% = 71.25%

Thus FtsManager indexing speed is limited by ping to database. On my machine it is about 1000 entities per second. For 1 million entities indexing time is about 20 minutes - which means a significant index downtime for those systems which rely on FTS search.

cuba-platform / fts

FTS indexing spends most of time fetching entitities one by one from database #46

Environment

Description of the bug or enhancement