FTS indexing is much slower than it could be. It spends most of time not to write lucene index, but to load entities and lazy-load entity graph from database.
Open refapp 6.10
Create 100 000 of User instances by invoking this script:
insert into sec_user (id, version, create_ts, created_by, update_ts, login, login_lc, name, email, group_id, position_)
select newid(), 1, now(), 'admin', now(),
'us' || t, 'us' || t,
'Name ' || t, 'user' || t || '@example.com',
'0fa2b1a5-1d68-4d69-9fbd-dff348347f93', 'Manager'
from generate_series(1,100000) t
Wait until indexing is finished and analyze CPU snapshot.
CPU time distribution:
FtsManager.processQueue() - 14160 ms
lazy-loading User.group field from LuceneIndexerBean.addLinkedPropertyEx - 3136 ms
lazy-loading User collections (roles, substitutions) from LuceneIndexerBean.addLinkedPropertyEx - 2602 ms
loading every entity to be indexed one by one by using em.find() - 4351 ms
So time share to load indexed entities from DB is:
(3136 + 2602 + 4351) / 14160 * 100% = 71.25%
Thus FtsManager indexing speed is limited by ping to database. On my machine it is about 1000 entities per second.
For 1 million entities indexing time is about 20 minutes - which means a significant index downtime for those systems which rely on FTS search.
Environment
Description of the bug or enhancement
FTS indexing is much slower than it could be. It spends most of time not to write lucene index, but to load entities and lazy-load entity graph from database.
CPU time distribution:
So time share to load indexed entities from DB is: (3136 + 2602 + 4351) / 14160 * 100% = 71.25%
Thus FtsManager indexing speed is limited by ping to database. On my machine it is about 1000 entities per second. For 1 million entities indexing time is about 20 minutes - which means a significant index downtime for those systems which rely on FTS search.