apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Make intra tasks in IndexingChain.flush parallel execute. #13349

Closed vsop-479 closed 1 month ago

vsop-479 commented 1 month ago

Description

Similar to https://github.com/apache/lucene/pull/13124, https://github.com/apache/lucene/pull/13190. Can we add an executor to SegmentWriteState to make tasks like writeNorms, writeDocValues, etc. in IndexingChain.flush parallel execute?

vsop-479 commented 1 month ago

@jpountz Please take a look when you get a chance.

jpountz commented 1 month ago

Lucene already has a model for indexing/flushing concurrency that consists of indexing documents from multiple threads. I guess that the idea that you are suggesting could make sense when the indexing rate is low, so it can't use all resources, and you'd like to use these unused resources to decrease flushing latency. But if your indexing rate is low, flushing shouldn't be slow unless you very rarely reopen? So I'm not sure if this would fix an actual problem.

For merging, it's a bit more compelling as things like force-merging were completely single-threaded before the changes that you listed, despite potentially taking a very long time.

vsop-479 commented 1 month ago

I think you are right @jpountz . Since indexing already use almost all resources in many cases, maybe it is less worth to add an executor to make intra tasks parallel execute for indexing.

I will close this issue.