Closed asfimport closed 11 years ago
Shai Erera (@shaie) (migrated from JIRA)
Just one comment, sampling benefits from this two pass, because that way we can guarantee a minimum sample set size. Maybe there's way to achieve that with in-collection aggregation too, but noting it here so that it's in our minds.
See my comment on #5663, the bitset may not be that small :).
Michael McCandless (@mikemccand) (migrated from JIRA)
sampling benefits from this two pass, because that way we can guarantee a minimum sample set size.
Ahh true ...
We have talked about adding a Scorer.getEstimatedHitCount (somewhere Robert has a patch...), so that eg BooleanQuery can do a better job ordering its sub-scorers, but I think we could use it for facets too (ie to pick sampling collector or not).
But, if the estimate was off (which it's allowed to be) ... then it could get tricky for facets, eg you may have to re-run the query with the non-sampling collector (or with higher sampling %tg) ...
Shai Erera (@shaie) (migrated from JIRA)
I'd rather if we rename this issue to something like "implement an in-collection FacetsAccumulator/Collector". I don't think that "facets should" aggregate only one way. There are many faceting examples, and some will have different flavors than others.
However, if this new Collector will perform better on a 'common' case, then I'm +1 for making it the default.
Note that I put 'common' in quotes. The benchmark that you're doing indexing Wikipedia w/ a single Date facet dimension is not common. I think that we should define the common case, maybe following how Solr users use facets. I.e., is it the eCommerce case, where each document is associated with <10 dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's say that the facets defaults are tuned for that case, and then we benchmark it.
After we have such benchmark, we can compare the two aggregating collectors and decide which should be default.
And we should also define other scenarios too: few dimensions, flat taxonomies, but with hundred thousands or millions of categories – what FacetsAccumulator/Collector (including maybe an entirely different indexing chain) suits that case?
We then document some recipes on the Wiki, and recommend the best configuration for each case.
Michael McCandless (@mikemccand) (migrated from JIRA)
I agree we should keep "do all aggregation at the end" ... it could be for some use-cases (sampling) it's better.
So the "aggregate as you collect" should be an option, and not necessarily the default until we can see if it's better for the "common" case.
Feel free to change the title of this issue!
Gilad Barkai (migrated from JIRA)
Aggregating all doc ids first also make it easier to compute actual results after sampling. That is done by taking the sampling result top-(c)K and calculating their true value over all matching documents, giving the benefit of sampling and results which could make sense to the user (e.g in counting the end number would actually be the number of matching documents to this category).
As for aggregating 'on the fly' it has some other issues
It sort of becoming a religion with all those "believes", as some scenarios used to make sense a few years ago. I'm not sure they still do. Can't wait to see how some of these co-exist with the benchmark results. If all religions could have been benchmarked... ;)
Michael McCandless (@mikemccand) (migrated from JIRA)
Initial prototype patch ... I created a CountingFacetsCollector that aggregates per-segment, and it "hardwires" a dgap/vint decoding.
I tested using luceneutil's date faceting and it gives decent speedups for TermQuery:
HighTerm 0.54 (2.7%) 0.63 (1.4%) 17.6% ( 13% - 22%)
LowTerm 7.69 (1.6%) 9.15 (2.1%) 18.9% ( 14% - 23%)
MedTerm 3.39 (1.2%) 4.48 (1.3%) 32.2% ( 29% - 35%)
Michael McCandless (@mikemccand) (migrated from JIRA)
New patch, adding a hacked up CachedCountingFacetsCollector.
All it does is first pre-load all payloads into a PackedBytes (just like DocValues), and then during aggregation, instead of pulling the byte[] from payloads it pulls it from this RAM cache.
This results in an unexpectedly big speedup:
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 0.53 (0.9%) 1.00 (2.5%) 87.3% ( 83% - 91%)
LowTerm 7.59 (0.6%) 26.75 (12.9%) 252.6% ( 237% - 267%)
MedTerm 3.35 (0.7%) 12.71 (9.0%) 279.8% ( 268% - 291%)
The only "real" difference is that I'm pulling the byte[] from RAM instead of from payloads, ie I still pay the vInt+dgap decode cost per hit ... so it's surprising payloads add THAT MUCH overhead? (The test was "hot" so payloads were coming from OS's IO cache via MMapDir).
I think the reason why HighTerm sees the least gains is because .advance is much less costly for it, since often the target is in the already-loaded block.
I had separately previously tested the existing int[][][] cache (CategoryListCache) but it had smaller gains than this (73% for MedTerm), and it required more RAM (1.9 GB vs 377 RAM for this patch).
Net/net I think we should offer an easy-to-use DV-backed facets impl...
Shai Erera (@shaie) (migrated from JIRA)
Net/net I think we should offer an easy-to-use DV-backed facets impl...
If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...
The patch looks very good. Few comments/questions:
If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:
Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().
Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.
Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.
I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.
Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.
In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?
Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only.
I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.
About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...
Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.
Overall though, great work Mike !
We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...
Shai Erera (@shaie) (migrated from JIRA)
Changing the title, which got me thinking – Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?
Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely.
If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.
Michael McCandless (@mikemccand) (migrated from JIRA)
Net/net I think we should offer an easy-to-use DV-backed facets impl...
If only DV could handle multi-values. Can they handle a single byte[]?
Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[].
They can handle byte[], so I think we should just offer that.
Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...
Right, though in the special (common?) case where a given facet field is single-valued, like the Date facets I added to luceneutil / nightlybench (see the graph here: http://people.apache.org/\~mikemccand/lucenebench/TermDateFacets.html – only 3 data points so far!), we could also use DV's int fields and let it encode the single ord (eg with packed ints) and then aggreggate up the taxonomy after aggregation of the leaf ords is done. I'm playing with a prototype patch for this ...
Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help.
No no: this is all just a hack (the CachedBytes / static cache). We should somehow cleanly switch to DV ... it wasn't clear to me how to do that ...
Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same?
Maybe! Have to test ...
If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:
I do! If ... making it fully generic doesn't hurt perf much. The decode chain (w/ separate reInit called per doc) seems heavyish ...
Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().
Ahh ok. I'll fix that.
Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.
OK I'll try that.
Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.
I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.
OK good.
Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.
Ahh I'll do that.
Separately I was wondering if we should sometimes do aggregation backed by an int[] hashmap, and have it "upgrade" to a non-sparse array only once the number collected got too large. Not sure it's THAT important since it would only serve to keep fast queries fast but would make slow queries a bit slower...
In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?
Also for multiple threads running at once ... but it's all a hack anyway ...
Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only. For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it?
Oh good point – the DV/cache collectors can accept out of order. I'll fix.
I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.
I think we should have two new collectors here? One keeps using payloads but operates per segment and aggregates on the fly (if, on making it generic again, we still see gains).
The other stores the byte[] in DV. But somehow we have to make "send the byte[] to DV not payloads at index time" easy ... I'm not sure how :)
About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively?
Right: base = current trunk, comp = the two new collectors.
I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...
This also surprised me, but I suspect it's the per-doc pointer dereferencing that's costing us. I saw the same problem with DirectPostingsFormat ... This also ties up tons of extra RAM (pointer = 4 or 8 bytes; int[] object overhead maybe 8 bytes?). I bet if we made a single int[], and did our own addressing (eg another int[] that maps docID to its address) then that would be faster than byte[] via cache/DV.
Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.
Yeah good question. I'll separately test the specialized decode to see how much it's helping....
Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?
Right! DV vs payloads is decoupled from during- vs post-collection aggregation.
I'll open a separate issue to allow byte[] DV backing for facets....
Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely. If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.
Definitely.
Overall though, great work Mike ! We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...
Thanks! I want to see that graph jump :)
Shai Erera (@shaie) (migrated from JIRA)
I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.
But note that DV means ugrading existing indexes. How do you move from a payload to DV? Is it something that can be done in addIndexes? If facets could determine where the data is written, per-segment, the indexes will be migrated on-the-fly, as segments are merged.
But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.
If you want to simulate DVs, you'll need to implement few classes. First, instead of CategoryDocBuilder, you can constuct your own Document, while adding DVFields. Just make sure that when you resolve a CP to its ord, you also resolve all its parents and add all of them to the DV - to compare today(payload) to today(DV) (today == writing all parents).
Then, I think that you should also write your CategoryListIterator, to iterate on the DV.
Those are the base classes for sure, maybe you'll need a few others to get the CLI into the chain.
I hope that I related to all the comments, but I might have missed a question :).
Shai Erera (@shaie) (migrated from JIRA)
Another point about DV - that's actually a design thing. One important hook is IntEncoder/Decoder. It determines how the fulltree is encoded/decoded. For example, you used one method (VInt+DGap), but there are other encoders. In one application, every document added almost unique facets and so the ordinals returned had a gap of 1-2. Therefore we have a FourOnes and EightOnes encoders.
Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.
Michael McCandless (@mikemccand) (migrated from JIRA)
I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.
+1, though we should test the on-disk DV vs current payloads to be sure.
But note that DV means ugrading existing indexes.
Hmm it would be nice to somehow migrate on the fly ... not sure how.
But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.
If we do the migrate-on-the-fly then users can use IndexUpgrader to migrate entire index.
Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.
+1, the abstractions are nice an generic.
I'll test to see how much these abstraction are hurting the hotspots ... we can always make/pick specialized collectors (like the patch) if necessary, and keep generic collectors for the fully general cases ...
Michael McCandless (@mikemccand) (migrated from JIRA)
I created #5667 to cutover to DV.
Michael McCandless (@mikemccand) (migrated from JIRA)
Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.
I tried this, changing the CountingFacetsCollector to the attached patch (to use CategoryListIterator), but alas those abstractions are apparently costing us in this hotspot (unless I screwed something up in the patch? Eg, that null I pass is kinda spooky!):
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 0.86 (4.7%) 0.56 (0.4%) -34.4% ( -37% - -30%)
MedTerm 5.85 (1.0%) 5.04 (0.5%) -13.9% ( -15% - -12%)
LowTerm 11.82 (0.6%) 11.02 (0.5%) -6.8% ( -7% - -5%)
base is the original CountingFacetsCollector and comp is the new one using the CategoryListIterator API.
I think we should try to invoke specialized collectors when possible?
Michael McCandless (@mikemccand) (migrated from JIRA)
Maybe you should take the IntArrayAllocator from the outside?
This actually makes me sort of nervous, because if the app passes 10 to IntArrayAllocator, it means we hold onto 10 int[] sized to the number of ords right?
Why try to recycle the int[]'s? Why not let GC handle it...?
Shai Erera (@shaie) (migrated from JIRA)
Why try to recycle the int[]'s? Why not let GC handle it...?
It was Gilad who mentioned "believes" and "religions" .. that code is written since Java 1.4. Not sure that at the time Java was very good at allocating and disposing arrays ... Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...
Perhaps leave it for now, and separately (new issue? :)) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?
Michael McCandless (@mikemccand) (migrated from JIRA)
Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...
Actually we stopped recycling with DWPT ... now we let GC do its job. But, also, when IW did this, it was internal (no public API was affected) ... I don't like that the app can/should pass in IntArrayAllocator to the public APIs.
Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?
OK I'll open a new issue. Rather than adding a cleanup thread to the current impl, I think we should remove Int/FloatArrayAllocator and just do new int[]/float[]? And only add it back if we can prove there's a performance gain? I think we should let Java/GC do its job ...
Shai Erera (@shaie) (migrated from JIRA)
Ok, let's continue the discussion on #5680. The Allocator is also used to pass the array between different objects, but perhaps there are other ways too.
Shai Erera (@shaie) (migrated from JIRA)
Patch introduces CountingFacetsCollector, very similar to Mike's version, only "productized".
Made FacetsCollector abstract with a utility create() method which returns either CountingFacetsCollector or StandardFacetsCollector (previously, FC), given the parameters.
All tests were migrated to use FC.create and all pass (utilizing the new collector). Still, I wrote a dedicated test for the new Collector too.
Preliminary results that we have, show nice improvements w/ this Collector. Mike, can you paste them here?
There are some nocommits, which I will resolve before committing. But before that, I'd like to compare this Collector to ones that use different abstractions from the code, e.g. IntDecoder (vs hard-wiring to dgap+vint), CategoryListIterator etc.
Also, I also want to compare this Collector to one that in collect() marks a bitset, and does all the work in getFacetResults.
Michael McCandless (@mikemccand) (migrated from JIRA)
Patch looks great: +1
And this is a healthy speedup, on the Wikipedia 1M / 25 ords per doc test:
Task QPS base StdDev QPS comp StdDev Pct diff
PKLookup 239.18 (1.5%) 238.87 (1.1%) -0.1% ( -2% - 2%)
LowTerm 98.99 (3.1%) 135.95 (1.8%) 37.3% ( 31% - 43%)
HighTerm 20.95 (1.2%) 29.08 (2.4%) 38.8% ( 34% - 42%)
MedTerm 34.55 (1.5%) 48.31 (2.0%) 39.8% ( 35% - 43%)
Shai Erera (@shaie) (migrated from JIRA)
handle some nocommits. Now there's no translation from OrdinalValue to FRNImpl in getFacetResults (the latter is used directly in the queue). I wonder if this buys us anything.
Michael McCandless (@mikemccand) (migrated from JIRA)
It's faster!
Task QPS base StdDev QPS comp StdDev Pct diff
PKLookup 239.75 (1.2%) 237.59 (1.0%) -0.9% ( -3% - 1%)
HighTerm 21.21 (1.5%) 29.80 (2.6%) 40.5% ( 35% - 45%)
MedTerm 34.90 (1.9%) 50.24 (1.9%) 44.0% ( 39% - 48%)
LowTerm 99.85 (3.7%) 152.40 (1.1%) 52.6% ( 46% - 59%)
Shai Erera (@shaie) (migrated from JIRA)
Patch adds two Collectors:
DecoderCountingFacetsCollector, which uses the IntDecoder abstraction (but the rest is like CountingFacetsCollector)
PostCollectionCountingFacetsCollector, which moves the work from collect() to getFacetResults(). In collect(), it keeps a per-DocValues.Source bits (FixedBitSet) of the matching docs.
I wonder how these two compare to CountingFacetsCollector. I modified FacetsCollector.create() to return any of the 3, so just make sure to comment out the irrelevant ones in the benchmark.
Michael McCandless (@mikemccand) (migrated from JIRA)
Base = DecoderCountingFacetsCollector; comp=CountingFacetsCollector:
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 25.67 (1.6%) 30.45 (1.9%) 18.6% ( 14% - 22%)
LowTerm 145.87 (1.0%) 154.38 (0.8%) 5.8% ( 4% - 7%)
MedTerm 44.45 (1.4%) 51.01 (1.5%) 14.8% ( 11% - 17%)
PKLookup 240.08 (0.9%) 239.94 (1.0%) -0.1% ( -1% - 1%)
So it seems like the IntDecoder abstractions hurt ...
Base = DecoderCountingFacetsCollector; comp=PostCollectionCountingFacetsCollector:
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 30.46 (0.8%) 30.16 (2.1%) -1.0% ( -3% - 2%)
LowTerm 142.89 (0.5%) 153.94 (0.8%) 7.7% ( 6% - 9%)
MedTerm 50.46 (0.8%) 50.65 (1.8%) 0.4% ( -2% - 2%)
PKLookup 238.65 (1.1%) 238.55 (0.9%) -0.0% ( -2% - 2%)
This is very interesting! And good news for sampling?
Shai Erera (@shaie) (migrated from JIRA)
ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint specialization is simple, specializing e.g. a packed-ints (or whatever other block encoding algorithm we'll come up with on LUCENE-4609) will make the code uglier :).
It looks like PostCollection doesn't hurt much? Can you compare it to Counting directly? I'm confused by the results ... they seem to improve the Decoder collector, but not sure how it will match to Counting. If the differences are miniscule (to any direction), then it could mean good news to sampling, because then we will be able to fold in sampling to this specialized Collector. But it would also mean that we can fold in complements (TotalFacetCounts).
So it looks like using any abstraction will hurt us. I didn't even try Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. the Collector will decode into an IntsRef, not using IntDecoder, and then delegate to Aggregator) – seems useless to me, as counting + default decoding are the common scenario that we want to target.
Based on the Counting vs PostCollection results, we should decide whether to always do post-collection in Counting, or not. Folding in Sampling and Complements should be done separately, because they are not so easy to bring in w/ the current state of the API.
Shai Erera (@shaie) (migrated from JIRA)
Hmm, it occurred to me that maybe your second comparison was between PostCollection and Counting? If so, then while it's indeed interesting, it's puzzling. PostCollection allocates FixedBitSet for every segment and in the end obtains a DISI from each FBS. As much as I know, DISIs over bitsets are not so cheap, especially when nextDoc() is called, because they need to find the next set bit ... if indeed it's faster, we must get to the bottom of it. It could mean other Collector could benefit from such post-collection technique ...
While on that, is the best way to iterate on a bitset's set bits via DISI? I'm looking at OpenBitSetDISI.nextDoc() and it looks much more expensive than FixedBitSet.nextSetBit(). I modified PostCollection to do:
while (doc < length && (doc = bits.nextSetBit(doc)) != -1) {
.. the previous code
++doc;
}
And all tests pass with this change too. I wonder if that's faster than DISI.
BTW, while making this change I noticed that I have a slight inefficiency in all 3 Collectors. If the document has not facets, I should have returned, but I forgot the return statement, e.g.:
if (buf.length == 0) {
// this document has no facets
return; // THAT LINE WAS MISSING!
}
The code is still correct, just doing some redundant extra instructions. I'll upload an updated patch, with both changes shortly.
Shai Erera (@shaie) (migrated from JIRA)
Patch fixes the missing return statement in all 3 collectors, as well as moves from DISI to nextSetBit.
Mike, is it possible to compare Counting and PostCollection to trunk, instead of to each other?
Michael McCandless (@mikemccand) (migrated from JIRA)
Can you compare it to Counting directly?
Ugh, sorry, that is in fact what I ran but I put the wrong base/comp above it. The test was actually base = PostCollectionCountingFacetsCollector, comp = CountingFacetsCollector.
Michael McCandless (@mikemccand) (migrated from JIRA)
StandardFacetsCollector (base) vs DecoderCountingFacetsCollector (comp):
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 21.44 (1.4%) 25.71 (1.3%) 19.9% ( 16% - 22%)
LowTerm 99.73 (3.2%) 145.71 (1.2%) 46.1% ( 40% - 52%)
MedTerm 35.13 (1.6%) 44.46 (1.1%) 26.6% ( 23% - 29%)
PKLookup 241.15 (1.0%) 238.90 (1.0%) -0.9% ( -2% - 1%)
StandardFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 21.26 (0.9%) 31.36 (1.4%) 47.5% ( 44% - 50%)
LowTerm 99.84 (3.2%) 159.17 (0.7%) 59.4% ( 53% - 65%)
MedTerm 34.91 (1.3%) 52.65 (1.2%) 50.8% ( 47% - 54%)
PKLookup 238.08 (1.3%) 238.26 (1.2%) 0.1% ( -2% - 2%)
StandardFacetsCollector (base) vs CountingFacetsCollector (comp):
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 21.35 (1.3%) 30.26 (2.9%) 41.7% ( 37% - 46%)
LowTerm 100.45 (4.0%) 153.26 (1.1%) 52.6% ( 45% - 60%)
MedTerm 35.02 (1.9%) 50.77 (2.0%) 45.0% ( 40% - 49%)
PKLookup 237.88 (2.4%) 239.34 (0.9%) 0.6% ( -2% - 4%)
Michael McCandless (@mikemccand) (migrated from JIRA)
I re-ran CountingFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):
Task QPS base StdDev QPS comp StdDev Pct diff
HighTerm 30.15 (1.4%) 30.97 (1.1%) 2.7% ( 0% - 5%)
LowTerm 153.06 (0.4%) 158.26 (0.7%) 3.4% ( 2% - 4%)
MedTerm 50.69 (0.9%) 52.29 (0.9%) 3.2% ( 1% - 5%)
PKLookup 238.04 (1.3%) 236.79 (1.8%) -0.5% ( -3% - 2%)
I think the cutover away from DISI made it faster ... and it's surprising this (allocate bit set, set the bits, revisit the set bits in the end) is faster than count-as-you-go.
Shai Erera (@shaie) (migrated from JIRA)
I'm surprised too. Throwing off a wild idea, maybe the post collection buys us a locality of reference in terms of the counts[] (and maybe even DocValues.Source?
It almost feels counter-intuitive, right? CountingFC's operations are a subset of PostCollectionCFC. The latter adds many bitwise operations, ifs, loops and what not. So what do we do? Stick w/ post-collection? :)
Michael McCandless (@mikemccand) (migrated from JIRA)
I ran the same test, but w/ the full set of query categories:
Task QPS base StdDev QPS comp StdDev Pct diff
AndHighLow 111.98 (1.0%) 110.10 (1.0%) -1.7% ( -3% - 0%)
HighSpanNear 128.42 (1.4%) 126.32 (1.1%) -1.6% ( -4% - 0%)
LowSpanNear 128.68 (1.4%) 126.59 (1.0%) -1.6% ( -3% - 0%)
MedSpanNear 128.18 (1.3%) 126.29 (1.1%) -1.5% ( -3% - 0%)
Respell 55.79 (3.9%) 55.35 (4.8%) -0.8% ( -9% - 8%)
PKLookup 206.89 (1.1%) 208.08 (1.5%) 0.6% ( -2% - 3%)
Fuzzy2 36.21 (1.3%) 36.49 (2.3%) 0.8% ( -2% - 4%)
MedPhrase 56.42 (1.4%) 56.94 (1.3%) 0.9% ( -1% - 3%)
Wildcard 64.26 (3.8%) 64.88 (2.0%) 1.0% ( -4% - 7%)
AndHighMed 51.80 (0.7%) 52.44 (1.2%) 1.2% ( 0% - 3%)
IntNRQ 18.49 (4.8%) 18.78 (5.5%) 1.6% ( -8% - 12%)
LowTerm 41.15 (0.6%) 41.82 (0.9%) 1.6% ( 0% - 3%)
Prefix3 46.94 (4.3%) 47.92 (3.4%) 2.1% ( -5% - 10%)
MedTerm 18.47 (0.8%) 18.92 (1.3%) 2.4% ( 0% - 4%)
HighPhrase 15.16 (6.2%) 15.77 (4.3%) 4.0% ( -6% - 15%)
HighTerm 6.76 (1.2%) 7.07 (1.2%) 4.5% ( 2% - 7%)
LowSloppyPhrase 17.14 (3.8%) 17.96 (2.3%) 4.8% ( -1% - 11%)
Fuzzy1 27.29 (0.8%) 28.62 (1.4%) 4.9% ( 2% - 7%)
MedSloppyPhrase 17.64 (2.4%) 18.90 (1.0%) 7.2% ( 3% - 10%)
AndHighHigh 11.11 (0.5%) 11.97 (0.9%) 7.7% ( 6% - 9%)
HighSloppyPhrase 0.83 (10.5%) 0.91 (5.9%) 10.1% ( -5% - 29%)
LowPhrase 15.83 (3.2%) 17.45 (0.2%) 10.2% ( 6% - 14%)
OrHighHigh 3.22 (0.7%) 3.80 (1.5%) 18.1% ( 15% - 20%)
OrHighLow 5.68 (0.3%) 6.73 (1.5%) 18.4% ( 16% - 20%)
OrHighMed 5.61 (0.5%) 6.66 (1.6%) 18.7% ( 16% - 20%)
Somehow post-collection is a big gain for the Or queries ... I wonder if somehow we are not getting the out of order scorer (BooleanScorer) w/ CountingCollector ... but looking at both collectors they both return true from acceptsDocsOutOfOrder ...
Net/net it seems like we should stick with post collection? The possible downside is memory use of the temporary bit set I guess ...
Michael McCandless (@mikemccand) (migrated from JIRA)
I confirmed that the Or queries are using BooleanScorer in both base and comp, so those gains are "real".
Michael McCandless (@mikemccand) (migrated from JIRA)
Results if I rebuild the index with NO_PARENTS (just to make sure the locality gains are not due to frequently visiting the parent ords in the count array):
Task QPS base StdDev QPS comp StdDev Pct diff
Respell 55.59 (3.9%) 54.45 (3.4%) -2.0% ( -8% - 5%)
IntNRQ 18.34 (7.1%) 18.04 (6.4%) -1.7% ( -14% - 12%)
AndHighLow 86.87 (0.6%) 86.26 (1.9%) -0.7% ( -3% - 1%)
MedSpanNear 97.31 (0.9%) 96.63 (1.8%) -0.7% ( -3% - 1%)
Prefix3 46.40 (5.6%) 46.11 (4.6%) -0.6% ( -10% - 10%)
LowSpanNear 97.76 (0.9%) 97.28 (1.8%) -0.5% ( -3% - 2%)
Fuzzy2 31.88 (1.6%) 31.77 (2.7%) -0.3% ( -4% - 3%)
Wildcard 62.53 (2.9%) 62.34 (2.5%) -0.3% ( -5% - 5%)
PKLookup 210.69 (1.5%) 210.37 (1.8%) -0.1% ( -3% - 3%)
HighSpanNear 97.44 (1.4%) 97.35 (1.7%) -0.1% ( -3% - 3%)
MedPhrase 49.87 (2.4%) 50.18 (2.5%) 0.6% ( -4% - 5%)
HighPhrase 14.32 (8.8%) 14.42 (8.8%) 0.7% ( -15% - 20%)
LowTerm 37.64 (0.5%) 37.90 (1.3%) 0.7% ( -1% - 2%)
AndHighMed 45.23 (0.6%) 45.74 (1.1%) 1.1% ( 0% - 2%)
MedTerm 22.53 (1.0%) 23.00 (1.3%) 2.1% ( 0% - 4%)
LowSloppyPhrase 16.27 (2.5%) 16.65 (5.7%) 2.3% ( -5% - 10%)
Fuzzy1 24.86 (1.7%) 25.87 (1.4%) 4.1% ( 0% - 7%)
HighTerm 7.67 (1.6%) 8.00 (2.4%) 4.3% ( 0% - 8%)
MedSloppyPhrase 16.67 (1.2%) 17.58 (3.1%) 5.5% ( 1% - 9%)
HighSloppyPhrase 0.81 (6.6%) 0.86 (12.8%) 6.9% ( -11% - 28%)
AndHighHigh 11.38 (0.8%) 12.18 (1.2%) 7.1% ( 5% - 9%)
LowPhrase 14.69 (4.7%) 15.82 (5.7%) 7.6% ( -2% - 18%)
OrHighHigh 3.60 (2.3%) 4.32 (3.3%) 20.0% ( 14% - 26%)
OrHighMed 6.20 (1.9%) 7.51 (3.0%) 21.1% ( 15% - 26%)
OrHighLow 6.25 (2.0%) 7.60 (2.4%) 21.7% ( 17% - 26%)
So net/net post is still better! Separately it looks like NO_PARENTS is maybe \~10% faster for the high-cost queries, but slower for the low cost queries ... which is expected because iterating over 2.2 M ords in the end is a fixed non-trivial cost ...
Shai Erera (@shaie) (migrated from JIRA)
Good. So I'll consolidate Post and Counting into one, an also add handling for NO_PARENTS case. Unfortunately, we cannot compare trunk vs patch for the NO_PARENTS case, unless we write a lot of redundant code (e.g. a NoParentsAccumulator). We'll have to suffice w/ the absolute QPS numbers I guess, which is about 12% improvements.
Shai Erera (@shaie) (migrated from JIRA)
Patch finalizes CountingFacetsCollector to handle the specialized case of facets counting, doing the counting in post-aggregation. Also, it can handle OrdinalPolicy.NO_PARENTS, allowing to index only leaf ordinals, and counting up the parents after the leafs' counts have been resolved.
Added a CHANGES entry, and updated some javadocs.
Would be good if we can give this version a final comparison against trunk. For the ALL_PARENTS case, we can compare the pct diff, while for NO_PARENTS we can only compare absolute QPS for now.
Michael McCandless (@mikemccand) (migrated from JIRA)
ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp):
Task QPS base StdDev QPS comp StdDev Pct diff
Respell 55.89 (3.2%) 55.13 (3.9%) -1.4% ( -8% - 5%)
PKLookup 207.52 (1.6%) 206.95 (1.4%) -0.3% ( -3% - 2%)
Wildcard 62.22 (3.2%) 62.94 (2.7%) 1.2% ( -4% - 7%)
IntNRQ 17.88 (5.2%) 18.16 (5.7%) 1.6% ( -8% - 13%)
Prefix3 45.56 (4.9%) 46.48 (4.1%) 2.0% ( -6% - 11%)
HighSloppyPhrase 0.80 (9.7%) 0.84 (8.5%) 4.9% ( -12% - 25%)
HighPhrase 13.52 (7.7%) 15.09 (8.1%) 11.6% ( -3% - 29%)
LowSloppyPhrase 15.02 (3.9%) 17.15 (4.0%) 14.1% ( 5% - 22%)
LowPhrase 14.14 (4.3%) 16.77 (4.9%) 18.6% ( 8% - 29%)
MedSloppyPhrase 14.81 (2.6%) 18.33 (2.7%) 23.7% ( 17% - 29%)
Fuzzy2 27.57 (2.6%) 34.95 (3.1%) 26.8% ( 20% - 33%)
AndHighHigh 9.39 (1.6%) 11.92 (1.4%) 27.0% ( 23% - 30%)
MedTerm 14.63 (2.2%) 18.89 (1.7%) 29.1% ( 24% - 33%)
HighTerm 5.28 (1.8%) 7.02 (2.4%) 33.0% ( 28% - 37%)
Fuzzy1 20.79 (2.1%) 27.71 (2.8%) 33.3% ( 27% - 39%)
OrHighLow 4.82 (1.8%) 6.70 (2.6%) 39.1% ( 34% - 44%)
OrHighMed 4.74 (1.8%) 6.61 (3.0%) 39.4% ( 34% - 44%)
OrHighHigh 2.68 (1.8%) 3.77 (2.9%) 40.9% ( 35% - 46%)
MedPhrase 39.21 (3.6%) 55.35 (3.6%) 41.2% ( 32% - 50%)
AndHighMed 36.29 (3.5%) 51.92 (2.0%) 43.1% ( 36% - 50%)
LowTerm 27.96 (3.2%) 41.47 (2.2%) 48.3% ( 41% - 55%)
AndHighLow 64.36 (5.4%) 107.94 (5.7%) 67.7% ( 53% - 83%)
MedSpanNear 70.17 (6.1%) 123.23 (7.4%) 75.6% ( 58% - 94%)
LowSpanNear 70.35 (6.0%) 123.59 (7.1%) 75.7% ( 58% - 94%)
HighSpanNear 70.35 (6.1%) 123.69 (7.8%) 75.8% ( 58% - 95%)
These are nice gains!
Michael McCandless (@mikemccand) (migrated from JIRA)
NO_PARENTS CountingFacetsCollector vs itself (ie all differences are noise). Use the absolute QPS to compare to the "QPS comp" column above, eg MedTerm was 18.89 QPS above with ALL_PARENTS and with NO_PARENTS MedTerm is 22.67-22.80 QPS:
Task QPS base StdDev QPS comp StdDev Pct diff
AndHighLow 85.20 (5.0%) 83.74 (5.7%) -1.7% ( -11% - 9%)
LowSpanNear 95.25 (5.5%) 93.67 (6.8%) -1.7% ( -13% - 11%)
HighSpanNear 95.19 (5.4%) 93.80 (6.7%) -1.5% ( -12% - 11%)
MedSpanNear 94.97 (5.4%) 93.59 (6.8%) -1.5% ( -12% - 11%)
AndHighMed 45.68 (2.8%) 45.29 (2.9%) -0.9% ( -6% - 4%)
OrHighLow 7.62 (2.2%) 7.55 (2.2%) -0.8% ( -5% - 3%)
OrHighHigh 4.33 (2.2%) 4.29 (2.2%) -0.8% ( -5% - 3%)
LowTerm 38.17 (2.0%) 37.90 (2.2%) -0.7% ( -4% - 3%)
OrHighMed 7.54 (2.2%) 7.49 (2.1%) -0.7% ( -4% - 3%)
Prefix3 45.95 (4.3%) 45.68 (4.4%) -0.6% ( -8% - 8%)
MedTerm 22.80 (2.2%) 22.67 (2.1%) -0.6% ( -4% - 3%)
Fuzzy1 26.16 (1.9%) 26.04 (2.0%) -0.4% ( -4% - 3%)
IntNRQ 17.94 (6.1%) 17.86 (6.2%) -0.4% ( -11% - 12%)
AndHighHigh 12.33 (1.2%) 12.29 (1.3%) -0.4% ( -2% - 2%)
Fuzzy2 32.00 (2.8%) 31.89 (3.0%) -0.3% ( -5% - 5%)
MedPhrase 49.48 (3.9%) 49.32 (4.4%) -0.3% ( -8% - 8%)
HighTerm 8.02 (2.1%) 8.00 (2.0%) -0.2% ( -4% - 3%)
PKLookup 211.76 (1.4%) 211.32 (1.8%) -0.2% ( -3% - 3%)
Wildcard 62.37 (2.3%) 62.28 (2.3%) -0.1% ( -4% - 4%)
MedSloppyPhrase 17.49 (2.5%) 17.52 (2.7%) 0.2% ( -4% - 5%)
Respell 55.68 (5.0%) 55.85 (3.3%) 0.3% ( -7% - 9%)
LowSloppyPhrase 16.29 (4.7%) 16.43 (5.2%) 0.9% ( -8% - 11%)
LowPhrase 15.68 (5.3%) 15.81 (5.4%) 0.9% ( -9% - 12%)
HighPhrase 14.22 (8.7%) 14.45 (8.9%) 1.6% ( -14% - 21%)
HighSloppyPhrase 0.83 (9.3%) 0.85 (11.9%) 2.1% ( -17% - 25%)
Michael McCandless (@mikemccand) (migrated from JIRA)
Also, total _*_dv.* file size is 445 MB for ALL_PARENTS and 351 MB for NO_PARENTS.
Michael McCandless (@mikemccand) (migrated from JIRA)
base = ALL_PARENTS, comp = NO_PARENTS:
Task QPS base StdDev QPS comp StdDev Pct diff
MedSpanNear 125.77 (2.0%) 79.31 (0.8%) -36.9% ( -38% - -34%)
LowSpanNear 124.86 (2.7%) 79.23 (0.5%) -36.5% ( -38% - -34%)
HighSpanNear 124.23 (2.3%) 79.44 (0.8%) -36.1% ( -38% - -33%)
AndHighLow 107.24 (1.4%) 72.70 (0.7%) -32.2% ( -33% - -30%)
MedPhrase 55.98 (0.6%) 44.89 (1.4%) -19.8% ( -21% - -17%)
AndHighMed 52.06 (0.7%) 43.20 (0.0%) -17.0% ( -17% - -16%)
Fuzzy2 35.71 (0.6%) 30.42 (1.6%) -14.8% ( -16% - -12%)
LowPhrase 17.27 (0.3%) 15.21 (3.2%) -11.9% ( -15% - -8%)
HighPhrase 15.20 (6.2%) 13.50 (4.7%) -11.2% ( -20% - 0%)
LowTerm 41.68 (0.4%) 37.49 (0.4%) -10.1% ( -10% - -9%)
LowSloppyPhrase 17.31 (2.9%) 15.75 (0.9%) -9.0% ( -12% - -5%)
Fuzzy1 28.11 (0.3%) 25.63 (0.0%) -8.8% ( -9% - -8%)
MedSloppyPhrase 18.42 (1.5%) 17.25 (0.1%) -6.3% ( -7% - -4%)
Respell 56.32 (0.3%) 54.41 (2.2%) -3.4% ( -5% - 0%)
HighSloppyPhrase 0.83 (6.8%) 0.81 (1.0%) -2.3% ( -9% - 5%)
Wildcard 63.43 (1.9%) 61.96 (0.3%) -2.3% ( -4% - 0%)
Prefix3 45.60 (0.5%) 45.70 (0.7%) 0.2% ( -1% - 1%)
IntNRQ 17.54 (0.6%) 17.60 (1.4%) 0.3% ( -1% - 2%)
PKLookup 205.89 (0.5%) 210.73 (0.7%) 2.4% ( 1% - 3%)
AndHighHigh 11.89 (0.2%) 12.48 (0.3%) 5.0% ( 4% - 5%)
HighTerm 7.00 (0.2%) 8.09 (0.1%) 15.6% ( 15% - 16%)
OrHighHigh 3.77 (0.6%) 4.36 (0.3%) 15.6% ( 14% - 16%)
OrHighLow 6.65 (0.1%) 7.69 (1.5%) 15.6% ( 14% - 17%)
OrHighMed 6.61 (0.4%) 7.66 (0.2%) 15.8% ( 15% - 16%)
MedTerm 18.86 (0.4%) 22.13 (0.4%) 17.3% ( 16% - 18%)
I think because this test has 2.5M ords ... the cost of "rolling up" in the end is non-trivial ...
Shai Erera (@shaie) (migrated from JIRA)
Thanks for running this. I think that given these results, making NO_PARENTS the default policy is not that good. I anyway think it's not a good default, because it forces the user to stop and think if the documents that he'll index share or not parents. This looks like an advanced setting to me, i.e. if you want to get "expert" and really know your content, then you can choose to index like so. Plus, given those statistics, I'd say that you have to test before you go to production with it (i.e. looks like it may be expensive as the number of ordinals grow...).
Mike found a bug in how I count up the parents in the NO_PARENTS case, so I fixed it (and added a test). I'll run tests a couple of times and commit this.
Michael McCandless (@mikemccand) (migrated from JIRA)
The performance depends heavily on how many ords your taxo index has ... my last test was \~2.5M ords, but when I build an index leaving out the two dimensions (categories, username) with the most ords, leaving 4703 unique ords, the numbers are much better:
Task QPS base StdDev QPS comp StdDev Pct diff
Prefix3 161.48 (6.1%) 161.99 (7.4%) 0.3% ( -12% - 14%)
PKLookup 235.50 (2.4%) 236.41 (2.1%) 0.4% ( -4% - 5%)
Respell 85.41 (4.4%) 85.92 (4.2%) 0.6% ( -7% - 9%)
AndHighLow 1196.56 (2.1%) 1204.67 (3.4%) 0.7% ( -4% - 6%)
IntNRQ 104.88 (6.7%) 105.77 (9.0%) 0.9% ( -13% - 17%)
Wildcard 215.17 (2.2%) 217.13 (2.6%) 0.9% ( -3% - 5%)
HighSloppyPhrase 3.24 (8.2%) 3.27 (9.2%) 1.0% ( -15% - 19%)
LowSpanNear 42.80 (3.0%) 43.68 (2.8%) 2.1% ( -3% - 8%)
Fuzzy2 84.83 (3.6%) 86.70 (2.8%) 2.2% ( -4% - 8%)
HighSpanNear 11.42 (1.9%) 11.70 (2.3%) 2.4% ( -1% - 6%)
LowPhrase 71.69 (6.8%) 73.91 (6.2%) 3.1% ( -9% - 17%)
Fuzzy1 75.53 (3.4%) 78.81 (2.7%) 4.3% ( -1% - 10%)
HighPhrase 42.58 (11.4%) 44.61 (11.5%) 4.8% ( -16% - 31%)
LowSloppyPhrase 80.22 (2.3%) 84.49 (3.1%) 5.3% ( 0% - 10%)
MedSpanNear 85.37 (1.9%) 91.16 (1.8%) 6.8% ( 3% - 10%)
MedSloppyPhrase 86.55 (2.7%) 92.84 (3.2%) 7.3% ( 1% - 13%)
MedPhrase 145.23 (5.6%) 156.11 (6.1%) 7.5% ( -3% - 20%)
AndHighMed 321.74 (1.2%) 346.20 (1.5%) 7.6% ( 4% - 10%)
AndHighHigh 84.28 (1.6%) 96.80 (1.7%) 14.9% ( 11% - 18%)
OrHighHigh 35.03 (2.9%) 42.53 (4.6%) 21.4% ( 13% - 29%)
OrHighMed 51.75 (3.0%) 63.90 (4.6%) 23.5% ( 15% - 32%)
OrHighLow 50.41 (3.0%) 62.51 (4.7%) 24.0% ( 15% - 32%)
HighTerm 58.55 (3.0%) 74.59 (4.2%) 27.4% ( 19% - 35%)
LowTerm 355.14 (1.6%) 480.44 (2.3%) 35.3% ( 30% - 39%)
MedTerm 206.44 (2.0%) 286.54 (3.1%) 38.8% ( 33% - 44%)
I also separately fixed a silly bug in luceneutil which was causing the Span queries to get 0 hits.
Commit Tag Bot (migrated from JIRA)
[trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436435
LUCENE-4600: add CountingFacetsCollector
Commit Tag Bot (migrated from JIRA)
[branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436446
LUCENE-4600: add CountingFacetsCollector
Shai Erera (@shaie) (migrated from JIRA)
Committed to trunk and 4x. Let's see if it makes nightly happy! :)
Uwe Schindler (@uschindler) (migrated from JIRA)
Closed after release.
Today the facet module simply gathers all hits (as a bitset, optionally with a float[] to hold scores as well, if you will aggregate them) during collection, and then at the end when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual aggregation.
We should investigate just aggregating as we collect instead, so we don't have to tie up transient RAM (fairly small for the bit set but possibly big for the float[]).
Migrated from LUCENE-4600 by Michael McCandless (@mikemccand), resolved Jan 21 2013 Attachments: LUCENE-4600.patch (versions: 7), LUCENE-4600-cli.patch Linked issues:
5684