apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.03k forks source link

IndexWriter should also warm flushed segments [LUCENE-2485] #3559

Open asfimport opened 14 years ago

asfimport commented 14 years ago

Spinoff of #3387.

You can now set a mergedSegmentWarmer on IW, which warms only newly merged segments.

But for consistency maybe we should change this to warm all new segments (ie, also flushed ones). We should rename it to something "setSegmentWarmer".

Really, the reader pool should be pulled out of IndexWriter, be externally provided, and be responsible for doing warming of new segments.


Migrated from LUCENE-2485 by Michael McCandless (@mikemccand), updated May 09 2016

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

But for consistency maybe we should change this to warm all new segments

As long as warming a new segment doesn't block that new segment from being exposed via getReader()?

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I'm not sure how practical this is or not... but in general, more context would enable a broader range of applications.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

As long as warming a new segment doesn't block that new segment from being exposed via getReader()?

If an application needs warming, it will need to warm up new segments exposed through getReader() anyway. If you're bent on fast turnaround, you're probably not relying on things being warmed up (or okay with the costs). Add to this the thing that for realtime-hungry deployments the size of newly-created (not merged) segments is likely smallish, and any warmup (if present) will take negligible time.

I think you're going to do a bit of overoptimizing here.

asfimport commented 14 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

If an application needs warming, it will need to warm up new segments exposed through getReader() anyway.

But it's very different... the advantage to warming new segments is that the warm step was considered part of the merge by getReader() - if the whole thing hadn't completed yet, getReader() would still immediately return with the old segments pre-merge. w/o this ability, there's no advantage to warming in a hook vs warming explicitly after getReader().

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

w/o this ability, there's no advantage to warming in a hook vs warming explicitly after getReader()

There is. Consistency. I understand that this word is not in high regard amongst Luceners (progress, not perfection!), but still. It is logical to have all your warming happen in one defined place. If Lucene does magic for you, and biggest part of said warming happens in a separate thread without making you wait - that's very nice! But that's just a sideffect, like compiler optimizations that may or may not happen. Also, if your app requires warming for each segment, having a single callback frees you from the need to determine for a given new segment returned from getReader(), if it is a product of merge and thus already warm, or is it a still-cold newly-flushed segment.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In addition to the "more context" that Yonik proposed (which I like), we could also pass to the warmer whether the segment was created by flush or by merge or by addIndexes.

This way the app could have a single place for all warming, but if necessary can pick & choose how it warms the difference cases separately, since warming after a merge is done in the background (won't block an NRT reopen).

I'd really like to first factor the ReaderPool out of IW though...

asfimport commented 11 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Bulk move 4.4 issues to 4.5 and 5.0

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Move issue to Lucene 4.9.