rename optimize to a less cool-sounding name [LUCENE-3454]

asfimport commented 13 years ago

I think users see the name optimize and feel they must do this, because who wants a suboptimal system? but this probably just results in wasted time and resources.

maybe rename to collapseSegments or something?

Migrated from LUCENE-3454 by Robert Muir (@rmuir), resolved Nov 12 2011 Attachments: LUCENE-3454.patch (versions: 2) Linked issues:

SOLR-3141

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

if anyone wants to take this, don't hesitate!

asfimport commented 13 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

Would it be wise to stick with a name less specific than collapseSegments for example, in order not to have an incorrect name that requires another renaming when this "command" ends up doing something new in the future?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Otis: thats a good point, currently optimize is just a "request", we should probably figure out what it "should be".

Should it be collapseSegments, which is a well-defined request without a cool sounding name?

Or, should it be <something else>, which gives you a more optimal configuration for search performance... (i still think optimize is a bad name even for this)? Personally i suspect its going to be hard to support this case, e.g. you would really need to know things like, if the user has executionService set on the IndexSearcher and how big the threadpool is and things like that to make an 'optimal' configuration... and we don't have a nice way of knowing that information today.

asfimport commented 13 years ago

Shai Erera (@shaie) (migrated from JIRA)

I personally find optimize() clear, and people got used to it already. However I do agree it sounds like it's doing magic, but really it just merges segments.

We already have maybeMerge(), how about consolidating the two under merge(MergeLevel) where MergeLevel is either AGGRESSIVE (optimize) or DELICATE (maybeMerge)? Or something in that spirit.

Anyway, if we end up renaming, we should probably rename MergePolicy.findSegmentsForOptimize as well. If we consolidate, we should consolidate it as well.

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I like Shai's idea, let's remove optimize completely. maybeMerge has the ideal name.

MergePolicy should then have only one method also getting this aggresiveness parameter.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

How about maybeMerge() and maybeMerge(int maxSegCount) (since we have to absorb optimize(int maxSegCount) too)?

This way what used to be optimize is now maybeMerge(1).

I like that this forces the user to make an explicit decision about how many segments they require the index to merge down to, so they realize by picking 1 they are asking for tons of merging.

asfimport commented 13 years ago

Shai Erera (@shaie) (migrated from JIRA)

Fine. Can we use the opportunity to remove 'maybe'? Just merge(). If it ends up doing nothing, it's still ok, with me at least.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm... I like maybeMerge() better because it makes it clear that there may be no merging done?

But maybe we can find a different word from maybe instead :) mergeIfNeeded?

asfimport commented 13 years ago

Shai Erera (@shaie) (migrated from JIRA)

It's just that maybeMerge feels like asking a question, to which you expect an answer. Why is it so important to have this 'maybe' or 'ifNeeded' in the API? just like optimize(), MP decides what to merge, and optimize() can end up doing nothing as well ...

Why do we have IR.reopen and not maybeReopen? That that it returns something is not much different than IW.merge(), IMO.

asfimport commented 13 years ago

Eks Dev (migrated from JIRA)

as a user,

+1 to parameter, if you do not know what NooSegments mean, you shouldn't be invoking this method.
what about tryMerge(int ), attempMerge

asfimport commented 13 years ago

DM Smith (migrated from JIRA)

When I started w/ Lucene, I read the docs and was drawn to call optimize because of its "cool name." However, it was reading the documentation at the time that convinced me that it was appropriate for my use case:

Creating an index that once created would never be modified. It needed to be as fast as possible for search on low performance computing devices (old laptops, ancient computers, netbooks, phones, ...).

Maybe I misunderstood, but wasn't it and isn't it still appropriate for that?

And I have no idea what NooSegments means.

If you want a really uncool name how about dumbDown()?

But either way, please document the appropriate use cases for it.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Why do we have IR.reopen and not maybeReopen?

That's a good point... and I actually don't like that naming either!!

I think it trips users up because of it's hidden maybe-ness, ie, users who always close the old reader on calling this API (which, if no reopen actually occurred, closes the very same reader it had just returned).

Also, IW already has a merge method, taking a OneMerge parameter; that naming I think is correct because it's unconditional: IW will carry out the merge you passed to it.

Naming is the hardest part!

asfimport commented 13 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

In spitball land... requestAggressiveMerge?

asfimport commented 13 years ago

Michael Sokolov (@msokolov) (migrated from JIRA)

squeeze()? You apply some pressure - maybe it merges, and maybe it doesn't. Depending how hard you squeeze. squeeze(1) is squeezing harder than squeeze(10), which is a bit weird, but that's just down to units of squishiness.

asfimport commented 13 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

I've seen people optimize after every single add(). Even the Django search framework Solango did this terrible mistake. So I think renaming is welcome, and now that Lucene does such a good merging job for you most people won't need optimize. How about:

reduceSegments(int numSegments)

Does it convey the "maybeness" well enough? If I use it I need to know what a segment is and have some feeling for how many segments there are and how many I want, so its usage will imply some knowledge. But maybeMerge() works as well.

asfimport commented 13 years ago

Doron Cohen (migrated from JIRA)

To me merge(num) doing nothing "because there are already no more than n segments" is as fine as close() doing nothing "because of already being closed" so +1 for merge(num).

asfimport commented 12 years ago

Mindaugas Žakšauskas (migrated from JIRA)

What about "defragment"?

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

setting affects 3.x:

Here's my +1 for a hard backwards break to remove this name.

asfimport commented 12 years ago

Grant Ingersoll (@gsingers) (migrated from JIRA)

How about compactIndex()?

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch.

I picked the name 'mergeIfNeeded', and it requires you pass in the maxSegmentCount (so mergeIfNeeded(1) == what optimize() does today).

All tests pass, including Solr, but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix (I tried to remove all references to optimize in Lucene).

I put nocommits for tests that need to be renamed... I'll do that at the last second so patching is easier.

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

but Solr has tons of user-facing references still to "optimize" that I'm not sure how to fix

That's fine. We probably shouldn't be changing Solr's external interfaces (and configuration options) for renamings like this.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Yonik: I'm not sure? Jan's comment really hits home at why I opened this issue.

I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead?

I really think optimize is very dangerous because it sounds magical, the kind of thing that tempts new users into getting terrible performance, instead it should be an expert thing.

asfimport commented 12 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

I think its worth considering "optimize" command to throw an exception in Solr for this reason... we can add an expert option like mergeIfNeeded instead?

I have mixed feelings on this ... Solr has really tried to ensure that the user level APIs (ie: HTTP params and the xml syntax) are either unaffected by under the cover changes, or provide good backcompat with warnings logged when people use "deprecated" syntax.

But I would suggest that we start by keeping this issue focused on nailing down the java level issues in the "lucene" layer and get that committed. Then when we're sure it's done and settled and happy, create a new issue for tracking the solr level API changes, with considerations like:

should we log a deprecation warning to 3.x if <optimize> or param optimize=true is specified, and ignore in 4.x?
should the existing 'maxSegments' attribute became an option on "commit" or add a new explicit "merge" command?
what do we do with postOptimize events listener registrations? log a deprecation warning, or rename and fire them if/when a merge to a single segment happens?

(my personal opinion, for any general change not just optimize, is to add new syntax to reflect the new world order, but as long as we can support the old syntax we should – with anoying warning logs)

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

It's at least an order of magnitude more painful to change common external interfaces in Solr than in Lucene (or any other type-safe library). "optimize" has been with us a very long time, and it should stay. It should be both simple and effective enough to clarify the documentation if needed. Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!).

On a side note: optimize still does just that - an index with fewer segments and deleted documents is still more efficient to search. We just need to make sure to document the downsides as well.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Changing the name would cause far more collective confusion and pain than the very small percent of people who might not understand when it's appropriate to call (and changing the name won't help them understand when it is appropriate!).

And thats exactly my goal here: whatever it takes. I want to cause confusion so that people think twice before calling optimize. Because I totally disagree with you that its a small percentage of people that know when its appropriate to call, when integrations such as Jan's example call it after every commit: thats horrible.

Thats why I think we should rename it in lucene, rename it in solr, but if people start pushing back thats fine.

I can totally take this up myself with the blogging/email list route instead and intentionally cause so much fucking confusion that nobody will ever use this again.

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

There should be extremely good reasons for changing Solr's external APIs - and this simply doesn't come close to rising to that level.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

fine, then i'm +1 to mike's patch.

If solr wants to be slow, then it can stay slow, I'm gonna stop fighting it.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

My proposal would be to leave "optimize" in Solr, but make it a no-op. No code will break and Solr will just drop a warning not in the log.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

The same could be done in 3.x of Lucene to prevent hard backwards breaks. As optimize does nothing to your index thats visible to the API/IndexReader/..., we could also meke IndexWriter.optimize() a no-op in Lucene 3.x and document it in the javadocs to do nothing anymore and that its only there to support binary backwards.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

very small percent of people who might not understand when it's appropriate to call

Are you saying only a small percentage of people use DataImportHandler or SpellChecker?

Because both of these optimize by default on any operation...

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Mike, about your patch:

Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method? By default, IW will call merge(maxSegments=Integer.MAX_VALUE) and when you call mergeIfNeeded(n) it would pass merge(maxSegments=n).

Otherwise +1 to commmit.

The stupid and horrible default optimizes in Solr should be handled in another SOLR-xxx issue.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK let's leave this issue focused on Lucene only since changing Solr is contentious.

Would it make sense to remove the extra method completely from MergePolicy and simply do all in one method?

+1, good idea! I'll rework the patch.

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties.

+1 on compact as suggested by Grant or something like that.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

But names like compact/vacuum/etc imply that its some sort of necessary maintenance routine...

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Vacuum may because it's part of the db vocabulary - I don't think compact does at all. Beyond that, that is what javadocs are for. Personally, I don't think we need to try and design the api for complete morons that just call methods without reading the javadoc for what they do.

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

And if we want to keep thinking of the complete moron, I'd make the same argument about mergeIfNeeded - the user might think these needs to be called all the time! My, it doesn't merge when needed? I better call this every couple seconds to make sure it merges when its needed! It's a ridiculous never ending path IMO.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think a lot of damage has been done by the name being optimize: e.g.: why does an incremental DIH update trigger this by default?

what sense does this make?

You can keep lying to yourself that the only problem is complete morons, i'm not buying it.

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

So now we ended up with mergeIfNeeded and maybeMerge()? At the start of this issue, it looks like we agreed to consolidate all methods under a single maybeMerge(). Mike suggested to have two variants of this, one that doesn't take maxNumSegments and one that does ... I'm fine with that too, as long as we have a single name.

I also agree with Mark, all these maybe's/ifNeeded (IR has these now too !) should be part of the Javadocs, not the method names. For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose. IMO, that is even more confusing than the IfNeeded versions, because I do not anticipate the instance to be closed just because I rolled-back changes.

IndexReader.openIfNeeded in fact does reopen() (the old version), but it's not called reopenIfNeeded. So we force the users to read the javadocs in order to understand that openIfNeeded reuses the unchanged segments from the given IndexReader and only opens the new ones ...

Names are hard, and I think we should opt for simple and intuitive ones. Javadocs should be used to clarify what the method does in more details. I personally was never confused by optimize(), so I don't mind if it's kept. But apparently others were confused by it (have no idea why) ...

I don't mind maybeMerge() (perhaps because it's not a new API), but if we want to remove the maybe-ness, let's call it something like invokeMergePolicy() (with and without maxNumSegments)? We can replace invoke by some other verb, maybe consultMP / runMerges ...?

asfimport commented 12 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

e.g.: why does an incremental DIH update trigger this by default?

Not sure... If they actually meant for this to happen, this just reinforces my point that the name of the call is not the issue. The guys who wrote DIH absolutely know what optimize does - hence if it was renamed they would have simply called maybeMerge(1) or whatever today.

At the time it was written, and depending on what it was used for, maybe it did make sense.

You can keep lying to yourself that the only problem is complete morons, i'm not buying it.

That's exactly it - it's not morons, so a simple name change won't really help. We need to document the current tradeoffs so people can make more informed decisions about when to optimize. And some people will still chose to maybeMerge(1) when others think maybe they shouldn't. That's OK.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

At the time it was written, and depending on what it was used for, maybe it did make sense.

You are missing the whole point, now it defaults to optimize on each incremental update, basically O(n^2) indexing performance, unless the user knows enough to know that optimize = shitty perf here and explicitly turn it off. and given its cool-sounding name, it discourages them from even questioning that it could be making things slow.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

How about the name "forceMerge(int)" instead?

Fundamentally, this is a different operation from maybeMerge() because that method only does "natural" merges, ie ones that the MP has selected on its own.

Whereas forceMerge means you are forcing the MP to do merging that it otherwise would not have naturally chosen to do.

I don't like names like compact/defragment since they still imply this is a sort of necessary periodic maintenance that you are expected / need to call.

The fact is, Lucene has made excellent progress on getting good performance on multi-segment indexes: Query rewriting (eg MTQ) and searching is per-segment. TieredMP now targets segments with deletions, and can merge out-of-order, etc. Reducing the index down to 1 segment is rarely justified given the cost (yes, there are times, like a fully static index, but this is rare).

The goal here is to discourage "typical" users from calling optimize ("expert" users will of course find the method and use it, hopefully in the "right" cases).

The API is badly trappy today; we've seen this over and over now (I just got a private email a few days ago... when I asked why they optimize after every "batch" they said "because it just seemed like the right thing to do"). We've all seen many users fall into this trap.

We can try to debate why this is so... I don't think it's because they are "morons". I think there are many other explanations. EG, our own FAQs, javadocs, the Lucene in Action book, tutorials, etc., all frequently "suggested" optimize in the past. I think, also, users often don't realize Lucene has "segments" and that optimize means these segments are "fully rewritten" and that this then implies O(N^2) cost if you call after every doc/batch, etc. These things are obvious to Lucene developers, but not so to users.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

For instance, IW.rollback() closes the IndexWriter, but the method is not called rollbackAndClose.

I agree, this is bad, because it's unexpected that the IW is closed: I think we should rename it to rollbackAndClose (I'll open a separate issue).

asfimport commented 12 years ago

Doron Cohen (migrated from JIRA)

Perhaps I am the only one, but I find these ifNeeded, mabyeThis, mabyeThat method names so ugly. I prefer JavaDoc for trying to catch the subtleties.

I feel that way too.

But a name change here seems in place, because as pointed above, there is an issue with current catchy name optimize().

My personal preference between the names suggested above is Mike's last one: forceMerge(int):

it describes what's done
does not suggest to do wonders
requires caller to think twice because of deciding to force a certain behavior

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

How about the name "forceMerge(int)" instead?

Fundamentally, this is a different operation from maybeMerge() because that method only does "natural" merges, ie ones that the MP has selected on its own.

Whereas forceMerge means you are forcing the MP to do merging that it otherwise would not have naturally chosen to do.

I'm not sure that I agree ... I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today. I set LogMP's maxMergeMB(ForOptimize) to 4GB, which means that I never end up merging segments larger than that. I call optimize() whenever I can, but at some point, optimize will do nothing (if the indexing process stops), or after my index grew a lot, many segments won't be merged, and optimize/forceMerge(1) will actually end up with X+1 segments, where X is the number of segments that are too large for me to merge.

Therefore I'm not sure that trading optimize for forceMerge is much better. Sure, it has a less cool name, but now I think it will be even more confusing, because I'll call forceMerge(1) and that won't do what I asked.

I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP. That's why we try to distinguish between maybeMerge() and optimize(int). So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge(). We will provide MPs that are good for 'regular' merges and 'optimize' and the user can pass whatever he wishes to do. The user can also pass the same MP instance to IWC, and that will control the regular merges IW does from time to time (we default to a 'regular' merging MP).

Just a thought.

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

I like forceMerge. I'm not opposed to renaming optimize, just one of the candidates so far and some of the new method names in general. Optimize is not the best description IMO regardless of how catchy it is. It's not really optimal for NRT as just one of many points...

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Some quick googling uncovers depressing examples of over-optimizing:

https://jira.duraspace.org/browse/FCREPO-155

http://stackoverflow.com/questions/3912253/is-it-mandatory-to-optimize-the-lucene-index-after-write

http://issues.liferay.com/browse/LPS-2944

http://download.oracle.com/docs/cd/E19316-01/820-7054/girqf/index.html

https://issues.sonatype.org/browse/MNGECLIPSE-2359

http://blog.inflinx.com/tag/lucene

That last one has this fun comment:

  // Lucene recommends calling optimize upon completion of indexing writer.optimize();

asfimport commented 12 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

I'm not sure sure that is a strong case.

The ones that mention optimizing after loading all your data is practically what had been often recommended for some time. Can't say the same about optimizing after every add.

However, up to most of Lucene 2, we still had JavaDoc that said:

If an index will not have more documents added for a while and optimal search performance is desired, then the optimize method should be called before the index is closed.

Based on that, I'd likely think I should optimize after bulk loading up all my data like one of those links asks about.

The optimize javadoc itself even simply said:

Requests an "optimize" operation on an index, priming the index for the fastest available search. Traditionally this has meant merging all segments into a single segment as is done in the default merge policy, but individaul merge policies may implement optimize in different ways.

Since, much of this javadoc had gotten better. But it's no surprise that there are cases of confusion out there? Most of those are from before this javadoc was fixed - and even then the old code and javaodoc/advice are out there reverberating around on google.

The situation with the javadoc is much better today - someone shouldn't need to ask those questions, or have those problems, but the javadoc used to be a trap in showing this great optimize method and not properly explaining or warning about its use.

Creating method names for cowboy method calling coders that don't read javadoc seems like the wrong approach to me.

Though I'm still +1 on renaming optimize to something more fitting.

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Well ... I don't like forceMerge for two reasons: (1) it may not actually force anything and (2) it takes a parameter maxNumSegments which is just one of the factors one would want to consider when doing optimize/merge. For instance, when I do index optimization, I cap the process by a time constraint and let it run until the time is exhausted. Given that today I can either call maybeMerge() or optimize(), I call optimize(), and I like it that I don't need to pass any 'fake' parameter, even though I'm aware that under the covers it does optimize(1).

What about expungeDeletes? Why is it not called maybeExpungeDeletes? Because by tweaking MP settings I can make it leave some deletes in segments. And why isn't it called expungeDeletesThenMerge or expungeDeletesNoMerge (ala the now gone addIndexesNoOptimize)? Don't get me wrong (before anyone opens an issue to rename it too) - I like expungeDeletes! :)

I feel that we mask from the user what happens under the covers when he calls any of the 3 methods (maybMerge, optimize, expungeDeletes). maybeMerge relate to mergeFactor as a bounding criteria (don't merge if there are aren't X segments at the same level), while optimize just uses it to determine how many segments to merge at once, and expungeDeletes ignores it altogether.

So if MP determines so much what will happen when IndexWriter calls it, why hide it from the method call? Instead of setting an MP once on IWC and hope that the settings I've done will match any future call to one of these methods, why not allow the user to pass the desired MP for the action he wants to perform? That way we can focus MP implementations on specific tasks.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I could set MP in such a way that forceMerge(1) would still do nothing. That's very simple in fact, and I do this today.

Sure, but remember: 1) this is the exception case (not the rule), and 2) you are an expert user.

We should name our APIs according to what they "typically" do, not by the exceptional cases, which can and should be handled by javadocs (for example, that you must close your IW if you hit OOME during addDocument).

The expert users (like you) who are bumping up against these exceptions can easily understand and handle them.

I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do.

I think that the problem is that we try to come up with names that reflect what API we IndexWriter should call on MP.

Right, optimize() is really sugar for "invoking the current MP and do whatever it says".

So maybe we should go for a more extreme change – how about having one method merge() which takes a MergePolicy with a single method findSegmentsForMerge().

This is a great idea! But I think we should pursue it under a new issue, after renaming the optimize method? It's a bigger change.

If we took this approach... I think IW would still need a "default MP" that it uses to kick off natural merges over time? (Ie, after a new segment is flushed).

Alternatively, we could have this extra MP sit fully "outside" of IW, and so instead of calling IW.merge(MP) you'd call MP.merge(IW), and that "foreign" MP would register merges with IW? Still, it's gonna get tricky, how the "natural" MP interacts with this foreign MP.

Or... maybe we remove IW.optimize, and instead open up a method on each MP impl (eg MP.forceMerge(int)), and you invoke this method on the MP instead? This way you still have the one MP, but IW doesn't need to expose hard-to-name sugar? Still sounds tricky though... the MP would ask IW to maybeMerge

asfimport commented 12 years ago

Shai Erera (@shaie) (migrated from JIRA)

Sure, but remember: 1) this is the exception case (not the rule)

I disagree ... I find myself more and more these days telling people to limit their merge size because of performance issues, whether it's for optimize/maybeMerge. Therefore I don't think it's the exception case, or will remain like that for long.

I think forceMerge(int) does a pretty good job explaining what the MP is going to try to do.

Is that a Javadoc statement? Because we could have just fixed optimize() javadocs without adding API that sort of commits to something that may not happen.

How about naming it doMaintenance?

If we took this approach... I think IW would still need a "default MP"
that it uses to kick off natural merges over time? (Ie, after a new
segment is flushed).

Sure, we will provide the best MP for doing natural/regular merges which will be the default of IWC.

I agree this route is bigger than just renaming optimize(), and I don't think that we need to change the interaction between IW and MP. But let's handle that in a separate issue.

apache / lucene

rename optimize to a less cool-sounding name [LUCENE-3454] #4528