Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments [LUCENE-7976]

apache / lucene

Apache Lucene open-source search software

https://lucene.apache.org/

Apache License 2.0

2.49k stars 988 forks source link

Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments [LUCENE-7976] #9025

Closed asfimport closed 6 years ago

asfimport commented 6 years ago

We're seeing situations "in the wild" where there are very large indexes (on disk) handled quite easily in a single Lucene index. This is particularly true as features like docValues move data into MMapDirectory space. The current TMP algorithm allows on the order of 50% deleted documents as per a dev list conversation with Mike McCandless (and his blog here: https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).

Especially in the current era of very large indexes in aggregate, (think many TB) solutions like "you need to distribute your collection over more shards" become very costly. Additionally, the tempting "optimize" button exacerbates the issue since once you form, say, a 100G segment (by optimizing/forceMerging) it is not eligible for merging until 97.5G of the docs in it are deleted (current default 5G max segment size).

The proposal here would be to add a new parameter to TMP, something like <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions welcome) which would default to 100 (or the same behavior we have now).

So if I set this parameter to, say, 20%, and the max segment size stays at 5G, the following would happen when segments were selected for merging:

> any segment with > 20% deleted documents would be merged or rewritten NO MATTER HOW LARGE. There are two cases, >> the segment has <5G "live" docs. In that case it would be merged with smaller segments to bring the resulting segment up to 5G. If no smaller segments exist, it would just be rewritten >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). It would be rewritten into a single segment removing all deleted docs no matter how big it is to start. The 100G example above would be rewritten to an 80G segment for instance.

Of course this would lead to potentially much more I/O which is why the default would be the same behavior we see now. As it stands now, though, there's no way to recover from an optimize/forceMerge except to re-index from scratch. We routinely see 200G-300G Lucene indexes at this point "in the wild" with 10s of shards replicated 3 or more times. And that doesn't even include having these over HDFS.

Alternatives welcome! Something like the above seems minimally invasive. A new merge policy is certainly an alternative.

Migrated from LUCENE-7976 by Erick Erickson (@ErickErickson), 7 votes, resolved Jun 17 2018 Attachments: LUCENE-7976.patch (versions: 13), SOLR-7976.patch Linked issues:

SOLR-8839
- 9417
- SOLR-12513
- 9052
- SOLR-12259
- SOLR-7733
- 9310
- SOLR-7733

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1 to the summary; thanks @ErickErickson, except I think this is dangerous:

specify maxSegments = 1 during forceMerge. This will override any maxMergedSegmentMB settings.

because it means you can get a too large segment in your index just by invoking forceMerge. I don't think we need that behavior? I.e., one way to shoot yourself (3a) is enough? So I think maxMergedSegmentMB should win over maxSegments passed to forceMerge.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

bq: So I think maxMergedSegmentMB should win over maxSegments passed to forceMerge

Works for me.

bq: one way to shoot yourself (3a) is enough ;)

WDYT about going one step further and deprecating maxSegments? Does having that extra knob (maxSegments) really add any value? A value of -1 for maxMergedSegmentMB would mean the same thing as the old optimize. That would avoid having to reconcile the two

So here's what we tell users (needs to be prettied up):

1> In general invoking forceMerge is unnecessary. Especially in a frequently updated index the default settings should suffice and forceMerge can hurt (there's a blog about that).

2> If you find you have too many deleted documents in your index, consider changing reclaimDeletesWeight in your configuration (and provide some guidance on reasonable values).

3> forceMerge now respects maxMergedSegmentMB. This means that forceMerge will no longer create an index with one segment by default although it will purge all deleted documents.

4> If you require forceMerge to produce a single segment, you must provide a parameter maxMergedSegmentMB=-1 to the forceMerge command. It is not recommended to set maxMergedSegmentMB=-1 as a permanent setting in your config as it will lead to excessive I/O during normal indexing. Invoking forceMerge with maxMergedSegmentMB=-1 is only recommended when you're willing and able to perform this operation whenever the index is changed or it will lead to excessive space occupied by deleted documents.

5> (assuming we deprecate maxSegments). forceMerge no longer supports maxSegments. You can approximate this behavior by selecting an appropriate value for maxMergedSegmentMB based on the total size of your index.

asfimport commented 6 years ago

Robert Muir (@rmuir) (migrated from JIRA)

<3b> is my attempt to reconcile the issue of wanting one huge segment but only when doing forceMerge. Yes, they can back themselves into a the same corner they get into now by doing this, but this is acceptable IMO. We're not trying to make it impossible to get into a bad state, just trying to make it so users don't do it by accident.

I'm against the maxMergedSegmentMB = -1 thing. Sorry, we don't need it.

It is not recommended to set maxMergedSegmentMB=-1 as a permanent setting in your config as it will lead to excessive I/O during normal indexing.

this is making matters worse. We don't need an "optimize-all-the-time" option.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

I'm not particularly wedded to using -1. Functionally though I don't see any difference between using -1 to mean "unlimited" and allowing values like 1,000,000,000. Disks aren't that big (yet at least).

Requiring an explicit number rather than using -1 is more in-your-face when it comes to understanding the consequences though.

That does bring up a point; how big is "too big"? Should we log a warning if someone configures this to be > (some number)? Even in the current code, one could configure maxMergedSegmentSizeMB to be ridiculous. Not sure how far down the road I want to go in protecting the users from themselves though.

bq: this is making matters worse.

Never advocated configuring maxMergedSegmentsSizeMB to -1, said it's not recommended. And they can do this now, just use a large number. If we're going to direct people's attention to using this parameter for forceMerge, we need to discourage changing it in solrconfig.xml. Or at least provide guidance.

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

WDYT about going one step further and deprecating maxSegments?

I think we should keep maxSegments argument to forceMerge: it has (separate) value because you may want to merge down to e.g. 10 segments that are not the maximum sized segments.

asfimport commented 6 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Should we log a warning if someone configures this to be > (some number)?

You've mentioned logging several times on this issue, I just want to remind you that lucene is an API: method calls need to be either valid, or not. It doesn't logging or any sheistiness like this.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Robert:

I'm quite aware that logging at the Lucene level doesn't happen. Since this is a setting in solrconfig.xml it's reasonable to log it at that level which is at least where I was thinking about it. Ditto a URL coming in.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

OK, I'm working on this today, @mikemccand, thanks for the hints.

Next up is making forceMerge respect maxMergedSegmentSizeMB. I further propose that this be an optional argument to the command that would override the setting in solrconfig.xml (if any). WDYT?

Note this is a significant change in behavior from the perspective that someone does a forcemerge and then will ask "What? I didn't get one segment when I was done!". Of course putting it in CHANGES.txt and the ref guide is indicated.

> if a person does forcemerge, then there are two parameters > maxMergedSegmentsSizeMB > maxSegments

> maxMergedSegmentsSizeMB overrides maxSegments if both are specified > if only one is specified, it's respected. > if neither are specified then whatever TMP was configured with is used.

asfimport commented 6 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I further propose that this be an optional argument to the command that would override the setting in solrconfig.xml (if any). WDYT?

+1 Seems like a good way to enable minor compactions during peak hours and major compactions off-peak.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

This is coming together, here's a preliminary patch. It has nocommits and several rough spots/hard-coded numbers, code commented out etc.

I'm putting it up in case anyone interested in this wants to take a look at the approach and poke holes in it. Please raise any concerns but also please don't spend a lot of time on the details before I wrap up things I know will need addressing.

Current state:

0> refactors a bit of findMerges to gather the stats into a separate class as that method was getting quite hard follow. I haven't made use of that new class in forceMerge or expungeDeletes yet.

1> forceMerge and expungeDeletes respect maxMergedSegmentSizeMB

2> regular merging will do "singleton merges" on overly-large segments when they're more than 20% deleted docs. 20% is completely arbitrary, don't quite know the correct numbers yet. That handles the case of a single-segment optimize not getting merged away for a long time.

3> forceMerge will purge all deleted docs. It tries to assemble max-sized segments. Any segments where the live docs are larger than maxMergedSegmentSizeMB get a singleton merge.

4> fixes the annoying bit where segments reported on the admin UI are improperly proportioned

5> expungeDeletes now tries to assemble max sized segments from all segments with > 10% deleted docs. If a segment has > 10% deleted docs and it's liveDocs > maxMergedSegmentSizeMB it gets a singleton merge.

What's left to do:

1> more rigorous testing. So far I've just been looking at the admin UI segments screen and saying "that looks about right".

2> Normal merging rewrites the largest segment too often until it gets to max segment size. I think it also merges dissimilar-sized segments too often.

3> compare the total number of bytes written for one of my test runs between the old and new versions. I'm sure this does more writing, just not sure how much.

4> allow forceMerge to merge down to one segment without having to change solrconfig.xml.

5> perhaps refactor, findMerges, forceMerge and findForcedDeletesMerges to make use of common code.

6> ????

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

@mikemccand do you know the condition in TieredMergePolicy that segmentsToMerge is used? I'm looking at refactoring out the common code and conceptually, forcemerge and expunge deletes are the same thing now, they just operate on slightly different initial lists. But findForcedDeletesMerges doesn't have that parameter and findForcedMerges does.

I guess I'm fuzzy on why a segment would be in segmentsToMerge but not in

writer.getMergingSegments()

this latter seems to be sufficient for detecting segments that are being merged in other cases...

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

One more thing. The current patch doesn't deal at all with the maxSegmentCount parameter to findForcedMerges. I'm thinking of deprecating it and having another method that takes the maxMergeSegmentSize(MB). I'll change the method name or something so when the method is removed anyone using it won't be trapped by the underlying method compiling but having a different meaning.

I'm not sure what use-case is served by specifying this anyway. We ignore it currently when we have max-sized segments.

I started looking at this and we already have maxSegments as a parameter to optimize and there's a really hacky way to use that (if it's not present on the command, set it to Integer.MAX_VALUE) and that's just....ugly. So changing that to maxMergeSegmentSizeMB seems cleaner.

Any objections?

asfimport commented 6 years ago

Marc Morissette (migrated from JIRA)

@ErickErickson Thanks for tackling this.

Regarding singleton merges: if I read your code correctly and am right about how Lucene works, I think that, on a large enough collection, your patch could generate \~50% more reads/writes when re-indexing the whole collection:

I think new documents are typically flushed once and merged 2-3 times before ending up in a large segment.
With a 20% delete threshold, old documents would, on average, be singleton merged 4 times before being expunged vs only one merge at a 50% delete threshold. In Latex notation:

20% deleted docs threshold:
\sum_{n=1}^\infnty (1 - 0.2)^n = (1 / (1 - (1 - 0.2))) - 1 = 4

50% deleted docs threshold:
\sum_{n=1}^\infnty (1 - 0.5)^n = (1 / (1 - (1 - 0.5))) - 1 = 1

On the odd chance that my math bears any resemblance to reality, I would suggest that you disable singleton merges when the short term deletion rate of a segment is above a certain threshold (say 0.5% per hour). This should prevent performance degradations during heavy re-indexation while maintaining the desired behaviour on seldom updated indexes.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Marc:

Thanks for looking, especially at how jumbled the code is right now!

I collected some preliminary stats on total bytes written, admittedly unscientific and hacky. I set a low maxMergedSegmentSizeMB and reindexed the same docs randomly. To my great surprise the new code wrote fewer bytes than the current code. My expectation was just what you're pointing out, I expected to see the new stuff write a lot more bytes. This was with an index that respected max segment sizes.

On my plate today is to reconcile my expectations and measurements. What I think happened is that Mike's clever cost measurements are getting in here.

The singleton merge is not intended (I'll have to ensure I didn't screw this up, thanks for drawing attention to it) to be run against segments that respect the max segment size. It's supposed to be there to allow recovery from the case where someone optimized to 1 huge segment. If it leads to a lot of extra writes in that case I think it's acceptable. If it leads to a lot more bytes written in the case where the segments respect max segment size, I worry a lot....

In the normal case, it's not that a segment are merged when it has > 20% deleted docs, it's that it becomes eligible for merging even if it has > 50% maxSegmentSize "live" docs.. What I have to figure out (all help appreciated!) is how Mike's scoring algorithm influences this. The code starting with // Consider all merge starts: is key here. Let's say I have 100 possible eligible segments and 30 "maxMergeAtOnce". The code starts at 0 and collects up to 30 segments and scores that merge. Then it starts at 1, collects up to 30 segments and scores that. Repeat until you start at 70, keeping the "best" merge as determined by the scoring method and use the best-scoring one. What I think is happening is that the large segments do grow past 20% before they're merged due to the scoring.

And there's a whole discussion here about what's a "good" number and whether it should be user-configurable, I chose 20% semi-randomly (and hard-coded it!) just to get something going.

All that said, performance is the next big chunk of this I need to tackle, insuring that this doesn't become horribly I/O intensive. Or, as you suggest, we figure out a way to throttle it.

Or throw out the idea of singleton merges in the first place and, now that expungeDeletes respects max segment size too, tell users who've optimized down to single segments that they should occasionally run expungeDeletes as they replace documents.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

@mikemccand re: do you know the condition in TieredMergePolicy that segmentsToMerge is used

Figured it out. Basically the forceMerge doesn't want to merge more than maxMergeAtOnceExplicit segments at once so there may be multiple passes (gosh, almost looks like Map/Reduce).

bq: I started looking at this and we already have maxSegments as a parameter to optimize and there's a really hacky way to use that (if it's not present on the command, set it to Integer.MAX_VALUE) and that's just....ugly. So changing that to maxMergeSegmentSizeMB seems cleaner.

Changing my mind about this. I found a better way to deal with an external (Solr-level) optimize command. For the update command, default maxSegments to MAX_INT, assume there's no limit and always respect maxMergedSegmentMB. If the user does specify the max number of segments allowed, do what they say even if it means creating one giant segment.

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Phew suddenly a lot of action here! I'll review the patch soon, but wanted to answer:

[\~mikemccand] do you know the condition in TieredMergePolicy that segmentsToMerge is used?

Right, the idea here is that if you call forceMerge, we will only merge those segments present in the index at the moment forceMerge started. Any newly written segments due to concurrent indexing will not participate in that forceMerge (unless you go and call forceMerge again). I think it's a bug that forceMergeDeletes doesn't do the same thing? Otherwise the operation can run endlessly?

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm confused what this means:

I further propose that this be an optional argument to the command that would override the setting in solrconfig.xml (if any). WDYT?

We are in Lucene not Solr here – I think what you mean is you want to change the forceMerge and forceMergeDeletes APIs in IndexWriter and the merge policy to optionally (default would be unbounded) accept a parameter to set the maxMergedSegmentSizeMB?

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think this change might be cleaner if we can reformulate the desired outcomes using the existing "generate candidates and score them" approach?

E.g. for singleton merges ... I wonder if we could just relax TMP to allow it to consider merges with fewer than maxMergeAtOnce, and then "improve" the scoring function to give a good score to cases that would reclaim > X% deletions?

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Thanks for taking the time Mike! Going in reverse order:

bq: I wonder if we could just relax TMP to allow it to consider merges with fewer than maxMergeAtOnce, and then "improve" the scoring function to give a good score to cases that would reclaim > X% deletions?

Interesting. At this point in the change cycle I've got my head around most of what's going on and can think about how to do it better. We'd want to tweak things I should think so scoring could return "don't do this merge at all"? I'm thinking of the case where we have, say, 1 max-sized (or bigger) segment as a candidate with a few deletions, we wouldn't want to merge that at all, right? I'm thinking some threshold score above which we score it as "don't bother"....

The number of changes I'm introducing here does make me nervous, I wonder if taking a fresh look at it with an eye toward just doing the above would lead to less surgery.... I mean this has been working fine for years, I do worry that I'm introducing bugs... I don't mind throwing away a bunch of work if smaller changes can cure my problem.

bq: I think what you mean is you want to change the forceMerge and forceMergeDeletes APIs in IndexWriter

Right, that would have been the consequence. But I changed my mind on this yesterday, I don't think any Lucene API change is needed after all. What I did instead (not in the current patch) is default the Solr "update" command to pass Integer.MAX_VALUE for the max number of segments to forceMerge. That just flows into the TMP code without changing the API and lets maxMergedSegmentBytes control how many segments are created. Anyone who wants the old behavior needs to pass 1 like the default is now.

bq: I think it's a bug that findForceMergeDeletes doesn't do the same thing....

OK, let me look this over again. Yesterday I started to see the differences between forceMerge and forceMergeDeletes and thought they should stay separate, but you seem to be saying the idea of combining them is worth exploring. I'll revisit this again this weekend. Wouldn't making that work require changing the findForceMergeDeletes interface? I'm perfectly willing but didn't want to do that without discussion. And it seems that then findForcedDeletesMerges and findForcedMerges would be very thin wrappers around the same code for both.... Or were you thinking of handling this differently?

Thanks again...

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Oh, and Mike (and others):

Please don't go over this with too fine a comb on my behalf. I'm grateful for any time you do want to spend of course, but I don't consider this patch in good enough shape for really serious review. Your comments already may mean that I revise the approach in a major way, that's the level I'm aiming for now.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Well, it didn't get simpler ;(...

@mikemccand The problem with tweaks to scoring is that the assumptions made in findForcedDeletesMerges and findForcedMerges now have to respect max segment size. Which really means that all three methods (including findMerges) are the same operation just with some different initial assumptions. findForcedMerges is particularly ugly in that it can have a segment count specified and that makes for some uglier code.

I think I want to defer your comments about findForcedMergeDeletes possibly having a bug and should do the same kind of round-tripping as findForcedMerges to another JIRA if necessary, this is already big enough.

Current state: > Despite all the nocommits and extraneous comments, I think it's pretty close to being functionally correct.

> I need to clean this up considerably as I've been concentrating on getting it structured, I'll leave it for a day or two and then look again.

> I'm not entirely sure I like the structure of the InfosStats class with the computeStats being sensitive to what kind of merge is being called for. On the one hand it does centralize all the different considerations. On the other it concentrates the ugliness in one place. Moving tricky code from one place to the other isn't necessarily an improvement. Oh the other other hand, when the trickiness was in findMerges, findForcedMerges and findForcedDeletesMerges I had to pass a bunch of parameters to getSpec which was ugly too.

> I've hard-coded 20% as a threshold in indexDeletedPctAllowed and it does double-duty, both as a threshold for total index pct deleted before singleton merges are allowed and the threshold for a singleton merge. I don't think this is something I particularly want to make into a tuning parameter right now, possibly leave that for another JIRA if at all. See the bit on perf below. With expungeDeletes and forceMerge now respecting max segment bytes, if someone really, really, really cares about this they can use those operations.

> singleton merges work, so if I have a massive segment it will gradually reduce in size over time. Max sized segments are also singleton-merged when the minimum deleted percentage threshold is reached and they're over 20% deleted docs.

> There's some reporting in this code that will disappear completely to measure bytes written. I compared this version to the original and I'm pleasantly surprised to see only about a 10% increase in bytes written with the new patch. For this testing, I indexed 10M docs with maxMergedSegmentMB=50. Each doc's ID was randomly generated between 0 and 10M and I ran through all 10M 25 times. I indexed in packets of 1,000 and sent the same packet to the old and new versions. Of course I added the reporting to the old version as well. Mind you that was last night so I haven't analyzed in detail yet.

> This approach, especially the singleton merges, will certainly increase the I/O if the index has been optimized to 1 segment. I don't think that's something that should be addressed in this JIRA (or at all). Prior to this there was no way to recover from that situation except to wait until most of it was deleted docs.

> I think the critical bit here is that all these merges, including the singleton merges, run through the (unchanged) scoring mechanism, which I haven't changed at all. I'll re-run my test today with the new code changing reclaimDeletesWeight to 1.5 'cause I'm curious. And maybe 1.0 (no effect) and maybe 0.75 (reducing deletes weight).

Let me know what you think.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Hold the presses. I really, really, really hate it when the thought occurs at 6:30 AM "Maybe if I approached the problem slightly differently it would be vastly simpler". I suppose sometimes I have to work through all the gory details before understanding the process enough to think of a simpler way...

Anyway, I said it above @mikemccand, If findForcedDeletesMerges and findForcedMerges all need to go through the work of findMerges to respect segment size, would it be possible to refactor out the meat of findMerges and feed it the eligible lists from findForcedDeletesMerges and findForcedMerges? Then a couple of parameters would need to be passed into the extracted method, things like max segment size and segs per tier etc. But the current code then stays largely intact.

Taking a hack at this now, if it pans out at all you should completely ignore the previous patch.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

I'm much more optimisitc about this approach. What this approach does is extract the scoring loop from findMerges and call it from findMerges, findForcedDeletesMerges and findForcedMerges.

Each of those methods creates a list of its peculiar version of eligible segments to merge and passes that (and some other info) to the extracted doFindMerges method.

So far it seems to work well.

Generally when it comes to large segments, here defined as anything over maxMergedSegmentBytes/2 live documents, they're ignoed unless the new parameter indexPctDeletedTarget is exceeded, which defaults to 20%. This means that if (and only if) the total number of deleted documents in the entire index is > 20%, then segments with > maxMergedSegmentBytes/2 live docs are eligible for merging. Whether they're merged or not depends on whether they are scored highest.

On a relatively quick test, setting indexPctDeletedTarget to 20% causes about 10% more bytes to be written. Setting it to 10% causes 50% more bytes to be written. Setting it to 50% (which is kind of the default now) causes the number of bytes written to drop by about 10%, but I consider that mostly noise.

forceMerge to 1 segment is possible, and continuing to index will gradually shrink that back as indexPctDeletedTarget gets exceeded as these large segments become eligible for merging.

So despite the size of the patch, the actual code differences are not nearly as great as it might seem. It's mostly moving some code around.

Comments welcome, I'm going to put this down for a few days. There are still a few nocommits and the like, but not many.

I do have one question: When should writer.numDeletesToMerge(info) be preferred over info.getDelCount()? The former seems more expensive.

Oh, and I haven't run precommit or test on it yet, just gathered stats on indexing to the new and old code.

asfimport commented 6 years ago

David Smiley (@dsmiley) (migrated from JIRA)

On a relatively quick test, setting indexPctDeletedTarget to 20% causes about 10% more bytes to be written. Setting it to 10% causes 50% more bytes to be written. Setting it to 50% (which is kind of the default now) causes the number of bytes written to drop by about 10%, but I consider that mostly noise.

Just curious, how did you go about measuring that?

FWIW I recently wrote a custom MergePolicy and along with it I wrote a fairly generic "MergePolicy simulator" that takes the MergePolicy and over many iterations feeds it dummy segments and as output records what merging it was told to do, then collect stats on all this like, critically, the "write amplification factor" (sum of all segments written / sum of new segments flushed). It's able to do this extremely fast since no actual indexing/merging is done at all. Maybe I should share it. There's plenty more it could do to be improved; notably it doesn't have any deleted doc simulation.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

bq: Just curious, how did you go about measuring that?

First a disclaimer: the intent here was to get some idea whether things had blown up all out of proportion so rigor wasn't the main thrust.

Anyway, I have a client program that assembles docs then sends the same set of docs to two instances of Solr, one running old and one running new code. Then I hacked in a bit to each that prints the number of bytes being merged into each new segment (i.e. each of the OneMerge's each time a MergeSpecification is returned from TieredMergePolicy.findMerges and accumulates the total) into a file.

Each doc has a randomly-generated ID in a bounded range so I get deletions.

So I get output like: Bytes Written This Pass: 15,456,941: Accumulated Bytes Written: 16,071,461,273 This pct del: 26, accum pct del max: 26

finally, I lowered the max segment size artificially to force lots and lots of merges. So there are several places it might not reflect reality.

Your simulation sounds cool, but for this case how deletes affect decisions on which segments to merge is a critical difference between the old and new way of doing things so needs to be exercised....

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

It'd be great to have this simulator available somewhere (maybe under dev-tools?); we can improve it (e.g. add deletions support), we can fix it to invoke the merge policy class(es) directly and record their choices over time, etc. @dsmiley maybe open a separate issue for this? It would really help merge policy experimentation...

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

This looks like it's coming together – thanks @ErickErickson!

Can we do this change in two parts? First part is the nice refactoring to have all the methods share a common scoring loop, which should show no behavior change I think?

Second part is adding the indexPctDeletedTarget?

Some minor comments:

@lucene.experimental

* <p><b>note</b>NOTD: As of Solr 7.4, forceMerge/optimize

* and expungeDeletes (findForcedMerges and

* findForcedDeletesMerges) respect the max segment

* size by default. forceMerge now defaults to

Hmm what is the note/NOTD? Can you change to As of Lucene 7.4 (these are Lucene sources)? Did you mean to remove the `@lucene.experimental`?

INDEXING, FORCEMERGE, EXPUNGEDELETES

Can you name it NATURAL, FORCE_MERGE and FORCE_MERGE_DELETES?

bq:

I just wanted to point out that the right way to do this is bq. (period, not colon) – then Jira will render that as a "quoted" text. It's hard to remember how all different places do it differently, but I keep seeing you doing "bq:" so thought I would point it out ;)

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

@mikemccand Thanks for looking!

About removing `@lucene.experimental`, yes that was deliberate, TMP has been around for a very long time and it seemed to me that it's now mainstream. I have no problem with putting it back. Let me know if that's your preference. Is putting it back for back-compat? Well, actually so we don't have to maintain back-compat?

Can we do this change in two parts? First part is the nice refactoring to have all the methods share a common scoring loop, which should show no behavior change I think?

Maybe I got the block quote thing right this time, thanks!

What's the purpose here? Mechanically it's simple and I'll be glad to do it, I'd just like to know what the goal is. My guess is so we can have a clear distinction between changes in behavior in NATURAL indexing and refactoring.

When you say "no change in behavior" you were referring to NATURAL merging, correct? Not FORCE_MERGE or FORCE_MERGE_DELETES. Those will behave quite differently.

can you name it NATURAL, FORCE_MERGE and FORCE_MERGE_DELETES?

done.

Hmm what is the note/NOTD? Can you change to As of Lucene 7.4

What can I say? I spend 99% of my life in Solr, everything is Solr, right? As for rest, typos late at night.

Done.

Finally, can you comment on this nocommit?

Should you be using writer.numDeletesToMerge rather than the info.getDelDocs other places

I see both of these in the code, and writer.numDeletesToMerge seems considerably more expensive. Is there a reason to prefer one over the other?

Thanks again!

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

About removing `@lucene.experimental`, yes that was deliberate, TMP has been around for a very long time and it seemed to me that it's now mainstream. I have no problem with putting it back. Let me know if that's your preference. Is putting it back for back-compat? Well, actually so we don't have to maintain back-compat?

Well, it expresses that the API might change w/o back-compat, and as long as TMP has been around, I'm not sure it's safe to remove that label yet. E.g. here on this issue we are working out big changes to its behavior (though, no API breaks I think?).

What's the purpose here? Mechanically it's simple and I'll be glad to do it, I'd just like to know what the goal is. My guess is so we can have a clear distinction between changes in behavior in NATURAL indexing and refactoring.

Hmm I was hoping to separate out changes that are just refactoring (with no change to behavior), which I think is the bulk of the change here, from changes that do alter behavior (the indexPctDeletedTarget). This makes large changes like this easier to review, I think.

When you say "no change in behavior" you were referring to NATURAL merging, correct? Not FORCE_MERGE or FORCE_MERGE_DELETES. Those will behave quite differently.

Hmm then I'm confused – I thought the refactoring was to get all of these methods to use the scoring approach (enumerate all possible merges, score them, pick the best scoring ones), and that that change alone should not change behavior, and then, separately, changing the limits on % deletions of a max sized segment before it can be merged.

Should you be using writer.numDeletesToMerge rather than the info.getDelDocs other places

Hmm I think it's more correct to use writer.numDeletesToMerge – that API will reflect any pending deletions as well, which can be significant. If you use info.getDelCount you are using stale information. That first method should not be too costly, unless soft deletes are used?

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

not sure it's safe to remove that label yet

I'll put @lucene.experimental back.

it's more correct to use writer.numDeletesToMerge

done

Hmm then I'm confused – I thought the refactoring was to get all of these methods to use the scoring approach (enumerate all possible merges, score them, pick the best scoring ones),

Right, but that has quite a few consequences when comparing old .vs. new behavior for FORCE_MERGE and FORCE_MERGE_DELETES for several reasons, mostly stemming from having these two operations respect maxSegmentBytes:

> Those two operations now respect maxSegmentSize, so very different segments will get merged than used to. The whole impetus to refactor is basically the realization that to respect maxSegmentSize, all three methods had to do essentially the same thing rather than each doing their own thing.

> Before, the top N segments up to maxMergeAtOnceExplicit would get merged, regardless of how big the resulting segment was. There was no cost analysis.

> The fact that the potential merges are scored also changes what segments will get merged on each pass even when doing a FORCE_MERGE down to one segment The end result if maxSegments=1 is still a single segment, but the intermediate merges will be different.

For NATURAL merging, this should be close to a no-op. It's a different story for the other two.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

New patch with Mike's suggestions and most tests running. I have to beast some overnight, at least one failed once (seed didn't reproduce) for no obvious reason but didn't fail afterwards.

I @Ignored one test (nocommitted too). I'll have to revisit that. Looking at it again my comment about why it's different is bogus, it should pass.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Fix for the failing test

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Right, but that has quite a few consequences when comparing old .vs. new behavior for FORCE_MERGE and FORCE_MERGE_DELETES for several reasons, mostly stemming from having these two operations respect maxSegmentBytes:

OK I see ... I think it still makes sense to try to break these changes into a couple issues. This one (just refactoring to share the scoring approach, with the corresponding change in behavior) is going to be big enough!

Hmm I see some more failing tests e.g.:

[junit4] Suite: org.apache.lucene.search.TestTopFieldCollectorEarlyTermination [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestTopFieldCollectorEarlyTermination -Dtests.method=testEarlyTermination -Dtests.seed=355D07976851D85A -Dtests.badapples=true -Dtests.locale=nn-N\ O -Dtests.timezone=America/Cambridge_Bay -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 869s J3 | TestTopFieldCollectorEarlyTermination.testEarlyTermination <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: GC overhead limit exceeded

[junit4] > at java.util.Arrays.copyOf(Arrays.java:3332) [junit4] > at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) [junit4] > at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) [junit4] > at java.lang.StringBuilder.append(StringBuilder.java:136) [junit4] > at org.apache.lucene.store.MockIndexInputWrapper.toString(MockIndexInputWrapper.java:224) [junit4] > at java.lang.String.valueOf(String.java:2994) [junit4] > at java.lang.StringBuilder.append(StringBuilder.java:131) [junit4] > at org.apache.lucene.store.BufferedChecksumIndexInput.<init>(BufferedChecksumIndexInput.java:34) [junit4] > at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:119) [junit4] > at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1072) [junit4] > at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.readEntries(Lucene50CompoundReader.java:105) [junit4] > at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:69) [junit4] > at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:70) [junit4] > at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:100) [junit4] > at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:78) [junit4] > at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:202) [junit4] > at org.apache.lucene.index.ReadersAndUpdates.getReaderForMerge(ReadersAndUpdates.java:782) [junit4] > at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4221) [junit4] > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3910) [junit4] > at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) [junit4] > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2077) [junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1910) [junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1861) [junit4] > at org.apache.lucene.index.RandomIndexWriter.forceMerge(RandomIndexWriter.java:454) [junit4] > at org.apache.lucene.search.TestTopFieldCollectorEarlyTermination.createRandomIndex(TestTopFieldCollectorEarlyTermination.java:96) [junit4] > at org.apache.lucene.search.TestTopFieldCollectorEarlyTermination.doTestEarlyTermination(TestTopFieldCollectorEarlyTermination.java:123) [junit4] > at org.apache.lucene.search.TestTopFieldCollectorEarlyTermination.testEarlyTermination(TestTopFieldCollectorEarlyTermination.java:113)

and

[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterDelete -Dtests.method=testOnlyDeletesTriggersMergeOnClose -Dtests.seed=355D07976851D85A -Dtests.badapples=true -Dtests.locale=en-IE\ -Dtests.timezone=Australia/Perth -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.05s J0 | TestIndexWriterDelete.testOnlyDeletesTriggersMergeOnClose <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=660, name=Lucene Merge Thread #6, state=RUNNABLE, group=TGRP-Tes\ tIndexWriterDelete] [junit4] > Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: segments must include at least one segment

[junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) [junit4] > Caused by: java.lang.RuntimeException: segments must include at least one segment [junit4] > at org.apache.lucene.index.MergePolicy$OneMerge.<init>(MergePolicy.java:228) [junit4] > at org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:701) [junit4] > at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2103) [junit4] > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3929) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Capture\ d an uncaught exception in thread: Thread[id=661, name=Lucene Merge Thread #7, state=RUNNABLE, group=TGRP-TestIndexWriterDelete|#7, state=RUNNABLE, group=TGRP-TestIndexWriterDelete] [junit4] > Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot merge

[junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) [junit4] > Caused by: java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot merge [junit4] > at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4072) [junit4] > at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4052) [junit4] > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3904) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) [junit4] > Caused by: java.lang.RuntimeException: segments must include at least one segment [junit4] > at org.apache.lucene.index.MergePolicy$OneMerge.<init>(MergePolicy.java:228) [junit4] > at org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:701) [junit4] > at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2103) [junit4] > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3929) [junit4] > ... 2 more

and

[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllSlowly -Dtests.seed=355D07976851D85A -Dtests.badapples=true -Dtests.locale=en-IE -Dtests.timezon\ e=Australia/Perth -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.21s J0 | TestIndexWriterDelete.testDeleteAllSlowly <<< [junit4] > Throwable #1: java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot complete forceMerge

[junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1917) [junit4] > at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1861) [junit4] > at org.apache.lucene.index.RandomIndexWriter.doRandomForceMerge(RandomIndexWriter.java:371) [junit4] > at org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:386) [junit4] > at org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:332) [junit4] > at org.apache.lucene.index.TestIndexWriterDelete.testDeleteAllSlowly(TestIndexWriterDelete.java:984) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > Caused by: java.lang.RuntimeException: segments must include at least one segment [junit4] > at org.apache.lucene.index.MergePolicy$OneMerge.<init>(MergePolicy.java:228) [junit4] > at org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:701) [junit4] > at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2103) [junit4] > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3929) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) [junit4] > at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) [junit4] 2> Apr 24, 2018 9:27:54 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #6,5,TGRP-TestIndexWriterDelete|#6,5,TGRP-TestIndexWriterDelete] [junit4] 2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: segments must include at least one segment

[junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) [junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) [junit4] 2> Caused by: java.lang.RuntimeException: segments must include at least one segment [junit4] 2> at org.apache.lucene.index.MergePolicy$OneMerge.<init>(MergePolicy.java:228) [junit4] 2> at org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:701) [junit4] 2> at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2103) [junit4] 2> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3929) [junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) [junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662) [junit4] 2>

Can we make these ints, and cast to double when we need to divide them?:

double totalDelDocs = 0;

double totalMaxDocs = 0;

Hmm that 50/100 integer division will just be zero:

cutoffSize = (long) ((double) maxMergeSegmentBytesThisMerge * (1.0 - (50/100)));

Hmm this left me hanging (in findForcedMerges):

// First condition is that

We define this:

int totalEligibleSegs = eligible.size();

But do not decrement it when we remove segments from eligible in the loop after?

In findForcedMerges since we pre-compute the per-segment sizes using getSegmentSizes, can you use that map instead of calling size(info, writer) again?

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

// We did our best to find the right merges, but through the vagaries of the scoring algorithm etc. we didn't // merge down to the required max segment count. So merge the N smallest segments to make it so.

Hmm can you describe why this would happen? Seems like if you ask the scoring algorithm to find merges down to N segments, it shouldn't ever fail?

We also seem to invoke getSegmentSizes more than once in findForcedMerges?

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

I think it still makes sense to try to break these changes into a couple issues

Works for me, I'll just leave the new parameter stuff commented out and we can discuss it in a separate JIRA.

is going to be big enough!

And scary enough. Anyway, I left the new bits about indexPctDeletedTarget commented out.

Thanks for the test failures, "it worked for me", but I'll beast this and see and try the seeds.

Can we make these ints, and cast to double when we need to divide them?:

done. Longs unnecessary as this is within a single core, right?

cutoffSize = (long) ((double) maxMergeSegmentBytesThisMerge * (1.0 - (50/100))); divide-by-zero

Oddly it doesn't, think the compiler casts them to doubles due to the 1.0 leading, but since it's confusing I'll use 50.0/100.0.

// First condition is that

Nice english wasn't it...... Removed, it was the remnant of some intermediate versions.

But do not decrement it when we remove segments from eligible in the loop after?

In that case it's a silly variable to have since eligible.size() is the same thing, removed it.

Hmm can you describe why this would happen?

The problem I ran into was that say we're merging down to 5 segments. At some point we might have 9 eligible segments, and it's even possible that none of them have deletes. The doFindMerges code may return a spec == null 'cause there's not enough segments to make a decent merge according to that code, and/or it would create an out-sized segment etc. So we need to collect segments 4-9 and merge them to get to 5 total segments. We try to choose the smallest ones.

This is actually similar in spirit to the old code, at the end of findForcedMerges there's a clause that catches this condition.

All that said, this seems ad-hoc, I'll take another look at it.

I've also imagined at least one edge case that results in very dissimilar size segments at the end of forceMerge, but with the singleton merge they'll correct themselves so I don't think it's really worth trying to protect against.

We also seem to invoke {{getSegmentSizes}} more than once in {{findForcedMerges}}?

I was going to take another look at all the usages of size(info, writer) as well as the getting the deleted doc count, I'll see. I think what I want to do, rather than manipulate the List<SegmentCommitInfo> eligible, is get the sizes up front (and maybe deleted docs?) and manipulate that (also pass that to doFindMerges).

Probably another patch to look soon.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

OK, I think this is getting quite close. The place I'm most uncomfortable is in findForcedMerges. See the TODO around line 778 plus the fact that there's a bunch of special handling depending on whether we're forceMerging to 1 segment, a max count or respecting max segments size. I want to make one more pass though it, there ought to be a more organized way of doing this.

Also, when giving a maximum number of segments, we calculate the ideal segment size and then I increase it by 25% on the theory that segments won't fit perfectly, so allow some extra space in hopes that all the segments will be fit into the max segment count the first time through.

However, there are still edge cases I think where that won't necessarily work on the first pass, especially if there are very many segments. In that case, at the very end there's a loop essentially saying "go through as many iterations as necessary increasing the max segment size by 25% each time until you can fit them all in the required number of segments". This really means that in this case you could rewrite the entire index twice. Is that OK? I don't want to spend a lot of time on this case though, it seems to me that if you specify this you'll have to live with this edge case.

@mikemccand There's another departure from the old process here. If there are multiple passes for forceMerge, I keep returning null in until there aren't any current merges running involving the original segments. Is there any real point in trying to create another merge specification if there are merges from previous passes going on? This is around line 684.

I beasted all of Mikes failures 120 times along with TestTieredMergePolicy and no failures. All tests pass and precommit worked.

Then, of course I made one tiny change so I'll have to go 'round that testing again. I also have to make another couple of runs at counting the total bytes written to see if something crept in.

That said, I think this is the last major rearrangement I want to do. If my additional testing succeeds and there are no objections, I'll probably commit sometime this weekend.

Thanks to all who've looked at this!

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @ErickErickson; I will try to review this soon. Maybe @s1monw can also have a look.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

@mikemccand Do note that I'm seeing these errors on my latest full test run, I'll chase down what's up with them. That said, I think I'm at a point where I don't want to do another significant change to the approach if I can help it or unless there are issues.... I'll see if I can reproduce these errors reliably and whether I can get them to occur with the unmodified TMP code.

Precommit passes though and compiles, can I ship it ;)

[junit4] - org.apache.solr.core.TestCodecSupport.testMixedCompressionMode [junit4] - org.apache.solr.search.join.TestScoreJoinQPNoScore.testRandomJoin [junit4] - org.apache.solr.TestRandomFaceting.testRandomFaceting [junit4] - org.apache.solr.TestRandomDVFaceting.testRandomFaceting [junit4] - org.apache.solr.core.TestMergePolicyConfig.testTieredMergePolicyConfig [junit4] - org.apache.solr.TestJoin.testRandomJoin [junit4] - org.apache.solr.TestJoin.testJoin [junit4] - org.apache.solr.core.TestSolrDeletionPolicy1.testKeepOptimizedOnlyCommits

asfimport commented 6 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Eric thanks for tackling this big issue here!

here are a couple comments:

please remove the commented part that refers to // TODO: See #9310
Can we find a better name for InfoInfo maybe SegmentSizeAndDocs
can you make SegmentSizeAndDocs static and maybe a simple struct ie. no getters and don't pass IW to it
can we assert that int totalMaxDocs is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check.
can we name maxMergeAtOnceThisMerge currentMaxMergeAtOnce or maybe just maxMergeAtOnce

I got down this quite a bit and I am starting to question if we should really try to change the algorithm that we have today or if this class needs cleanup and refactorings first. I am sorry to come in late here but this is a very very complex piece of code and adding more complexity to it will rather do harm. That said, I wonder if we can generalize the algorithm here into a single method because in the end they all do the same thing. We can for instance make the selection alg pluggable with a func we pass in and that way differentiate between findMerges and findForceMerge etc. At the end of the day we want them all to work in the same way. I am not saying we should go down all that way but maybe we can extract a common code path that we can share between the places were we filter out the segments that are not eligible.

This is just a suggestion, I am happy to help here btw. One thing that concerns me and is in-fact a showstopper IMO is that the patch doesn't have a single test that ensures it's correct. I mean we significantly change the behavior I think it warrants tests no?

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Simon:

Thanks for taking the time, and yeah, the complexity bothers me too.

I'm reluctant to take the comments out for 8263 as their lifetime is expected to be a few days after this is checked in and I don't want to have to remember how I got it to work, but we can if you strongly object.

About tests. There are actually a number of tests for TMP functionality already, in particular merging down to N segments and forcemerging, what particular parts of the code do you think should be exercised more?

As for naming/style changes, sure. I always feel like an interloper in the Lucene code; I'm perfectly willing to try to keep it consistent with the folks' expectations who, you know live in the Lucene codebase ;)

As for your other suggestions, I don't know. I've rewritten/refactored/whatever this about three times already and every time I do it gets complicated again.

I wonder if we can generalize the algorithm here into a single method because in the end they all do the same thing.

doFindMerges is called from all three operations so I'm a bit puzzled about what this would look like. Each of the three operations has different initial conditions, eligible segments, max segment size allowed and the like. Once those are established, they all go through the same scoring/selection code. Figuring out the initial conditions is "interesting"

There are several areas here that are particularly gnarly and any way to un-gnarl them would be great.

> the variable number of segments in findForcedMerges. I'd love for there to be exactly two choices; merge down to one or respect max segment size. Controlling the number of segments would then be more along the lines of setting maxMergedSegmentsMB in TMP.

> the multi-pass nature of findForcedMerges because it respects the maxMergeAtOnceExplicit.

> Somewhat different decisions need to be made in doFindMerges depending on what type of merge it is. I'm not a huge fan of passing that enum in. One of the iterations tried to pass information into an uber-function but that lead to having to pass around segmentsToMerge from findForcedMerges, which wasn't present in the other two, so passing null in from them was also ugly.

> I also ran into a couple of issues with findMerges needing to not merge segments if there weren't enough in the tier, which is exactly the opposite of findForcedMerges, which itself has conditions around whether it should merge a segment with no deletions or not if exceeds maxMergedSegmentMB which itself is a variable condition based on whether maxSegments has been specified.....

Let me know if you're going to take a whack at it, even a skeleton of a different approach would help me get on the same page.

Meanwhile I can incorporate your other comments, they'll be useful no matter what.

asfimport commented 6 years ago

Shawn Heisey (@elyograg) (migrated from JIRA)

I have not been following the code, but I have been following the discussion. How much of the complexity is related to not surprising existing users?

I think a lot of concerns would disappear if the focus is implementing a new merge policy instead of attempting to fix/augment TMP.

@mikemccand, was TMP a ground-up implementation, or an evolution from LBMP? When I find some time, I can look at the code.

I've been on a little bit of a "let's rewrite it from scratch!" kick lately with regards to my own code, so keep that in mind when evaluating any bias I might show!

asfimport commented 6 years ago

Shawn Heisey (@elyograg) (migrated from JIRA)

General thoughts: I think the overall goals stated in the class-level javadoc for TieredMergePolicy are good. Based on the discussion, minimizing deleted docs (whether there has been a forceMerge in the past or not) needs to be a more significant focus than it is now.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Shawn:

Much of the point of this work is exactly to change the behavior. The old code did this massive merge into a single segment for forceMerge which made it easy to shoot oneself in the foot. forceMerge, expungeDeletes, and IndexUpgraderTool have the same problem: they produce very large segments that then aren't merged away unless they have < maxSegmentSizeMB/2 "live" docs. There's a blog about that... https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

I thought a bit about a new policy, but the motivation here is to alter the behavior of producing very large segments so I decided to approach it by altering TMP, rather than creating a new policy and deprecating TMP.

FWIW.

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

@s1monw[\~mikemccand] So I'm finally thinking about the tests. Simon's totally right, I really hadn't been thinking about tests yet, but now that he prompted me it's, well, obvious that there are some that can be written to test things like respecting max segment size by default etc...

Anyway, since I don't know what documents are in what segments, I can't really predict some things, like which specific segments should be merged under various conditions.

I see two approaches: 1> delete documents from specific segments. I'm guessing this is just getting terms(field) from a leaf reader and enumerating?

2> Just delete some random documents, examine the segments before and after a forceMerge or expungeDeletes with various parameters to see if my expectations are met.

Got any preferences?

Oh, and the test failures were because I'd missed a check and I've incorporated the rest of Simon's comments, no new patch until tests.

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

@ErickErickson if you index with a single thread, and commit() at the right times you can build a precise set of segments and then directly test TMP's behavior. I like approach one since it then gives you full deterministic control to enumerate the different tricky cases that surface in real indices?

asfimport commented 6 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

> @ErickErickson if you index with a single thread, and commit() at the right times you can build a precise set of segments and then directly test TMP's behavior. I like approach one since it then gives you full deterministic control to enumerate the different tricky cases that surface in real indices?

I really think we should start working towards testing this as real unittest. Creating stuff with IW and depending on it is a big issue. We can change the code to be less dependent on IW. I think we should and we should do it before making significant changes to MPs IMO

asfimport commented 6 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

Latest patch incorporating comments so far and adding tests, plus some Solr documentation changes.

I'd really like to wrap this up soon, I think we should move more refactoring/restructuring on to other JIRAs.

Precommit and tests run, or at least ran before I worked on the tests.

TODO:

make sure precommit and test still runs.
do some more measurements against old code to insure that the write rates are OK.
give the code a last once-over, make sure I took out anything I had in there for debugging and the like.

Otherwise, AFAICT, it's ready to rock-n-roll.

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I really think we should start working towards testing this as real unittest.

+1, this would be great; I think we'd need to disentangle IndexWriter and MergePolicy a bit to enable this? We pass IndexWriter to MergePolicy's methods, but then the merge policy only uses a few methods like numDeletesToMerge, segString (for verbosity), getMergingSegments ; we could abstract these out somehow? Or use a mocking tool I suppose.

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

@ErickErickson I'll review your latest patch soon!

asfimport commented 6 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Net/net I like this change; it would allow the forced merges to pick "better" merges to achieve their constraints. But I think we do need to keep iterating... this is really a complex change.

Can you remove the @Test annotations? They are redundant.

twoManyHaveBeenMerged --> tooManyHaveBeenMerged?

Can you name it maxDoc not maxDocs? E.g. segMaxDocs -> segMaxDoc, just to be consistent w/ maxDoc elsewhere in our sources.

    int originalSortedSize = sortedEligible.size();
    if (verbose(writer)) {
      message("findMerges: " + originalSortedSize + " segments", writer);
    }
    if (sortedEligible.size() == 0) {
      return null;
    }

You can use originalSortedSize in the if above?

Should the haveOneLargeMerge also look at the merging segments to decide whether a large merge is already running before invoking TMP? Otherwise the very next time IW invokes TMP it could send out another large merge.

This new change worries me:

    // If we're already merging, let it all merge before doing more.
    if (merging.size() > 0) return null;

And I think that's the line you were asking me about with this?

Michael McCandless There's another departure from the old process here. If there are multiple passes for forceMerge, I keep returning null in until there aren't any current merges running involving the original segments. Is there any real point in trying to create another merge specification if there are merges from previous passes going on? This is around line 684.

But I think that's dangerous – won't this mean that we will fail to return a merge that could concurrently run with already merging segments, in the cascading case?

These parts still seem smelly to me:

     // Fudge this up a bit so we have a better chance of not having to rewrite segments. If we use the exact size,
     // it's almost guaranteed that the segments won't fit perfectly and we'll be left with more segments than
     // we want and have to re-merge in the code at the bottom of this method.
     maxMergeBytes = Math.max((long) (((double) totalMergeBytes / (double) maxSegmentCount)), maxMergedSegmentBytes);
     maxMergeBytes = (long)((double) maxMergeBytes * 1.25);

    // we're merging down to some specified number of segments> 1, but the scoring with the first guess at max
    // size of segments didn't get us to the required number. So we need to relax the size restrictions until they all
    // fit and trust the scoring to give us the cheapest merges..
    // TODO: Is this really worth it? It potentially rewrites the entire index _again_. Or should we consider
    // maxSegments a "best effort" kind of thing? Or do we just assume that any bad consequences of people doing this
    // is SEP (Someone Else's Problem)?

    if (maxSegmentCount != Integer.MAX_VALUE) {
      while (spec == null || spec.merges.size() > maxSegmentCount) {
        maxMergeBytes = (long)((double)maxMergeBytes * 1.25);
        sortedSizeAndDocs.clear();
        sortedSizeAndDocs.addAll(holdInCase);
        spec = doFindMerges(sortedSizeAndDocs, maxMergeBytes, maxMergeAtOnceExplicit,
            maxSegmentCount, MERGE_TYPE.FORCE_MERGE, writer);
      }
    }

The 25% up front fudge factor, the retrying in the end with larger and larger fudge factors ... seems risky; rewriting the index 2X times during forceMerge is no good. I wonder if there's a better way; it's sort of a bin packing problem, where you need to distribute existing segments into N bins where the bins are close to the same size?

Or, maybe we can change the problem: when you forceMerge to N segments, should we really allow violating the maxMergedSegmentMB constraint in that case?

I think that retry loop can run forever, if you have an index that has so many segments that in order to force merge down to the requested count, the merges must cascade? Maybe make a test? E.g. if you want to force merge to 5 segments, and the index has more than 6 * 30 (default maxMergeAtOnceExplicit) then it may spin forever?

Can you change doFindMerges to not modify the incoming list (i.e. make its own copy), then you don't need to holdInCase local variable?

Can you use an ordinary if here instead of ternary operator? And invert the if to if (mergeType == MERGE_TYPE.NATURAL)? And then move the // if forcing comment inside the else clause?

        int lim = (mergeType != MERGE_TYPE.NATURAL) ? sortedEligible.size() - 1 : sortedEligible.size() - maxMergeAtonce;

Previous Next

apache / lucene

Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments [LUCENE-7976] #9025

9417

9052

9310