Explore facets aggregation during documents collection [LUCENE-4600]

asfimport commented 11 years ago

Today the facet module simply gathers all hits (as a bitset, optionally with a float[] to hold scores as well, if you will aggregate them) during collection, and then at the end when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual aggregation.

We should investigate just aggregating as we collect instead, so we don't have to tie up transient RAM (fairly small for the bit set but possibly big for the float[]).

Migrated from LUCENE-4600 by Michael McCandless (@mikemccand), resolved Jan 21 2013 Attachments: LUCENE-4600.patch (versions: 7), LUCENE-4600-cli.patch Linked issues:

5684

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Just one comment, sampling benefits from this two pass, because that way we can guarantee a minimum sample set size. Maybe there's way to achieve that with in-collection aggregation too, but noting it here so that it's in our minds.

See my comment on #5663, the bitset may not be that small :).

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

sampling benefits from this two pass, because that way we can guarantee a minimum sample set size.

Ahh true ...

We have talked about adding a Scorer.getEstimatedHitCount (somewhere Robert has a patch...), so that eg BooleanQuery can do a better job ordering its sub-scorers, but I think we could use it for facets too (ie to pick sampling collector or not).

But, if the estimate was off (which it's allowed to be) ... then it could get tricky for facets, eg you may have to re-run the query with the non-sampling collector (or with higher sampling %tg) ...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

I'd rather if we rename this issue to something like "implement an in-collection FacetsAccumulator/Collector". I don't think that "facets should" aggregate only one way. There are many faceting examples, and some will have different flavors than others.

However, if this new Collector will perform better on a 'common' case, then I'm +1 for making it the default.

Note that I put 'common' in quotes. The benchmark that you're doing indexing Wikipedia w/ a single Date facet dimension is not common. I think that we should define the common case, maybe following how Solr users use facets. I.e., is it the eCommerce case, where each document is associated with <10 dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's say that the facets defaults are tuned for that case, and then we benchmark it.

After we have such benchmark, we can compare the two aggregating collectors and decide which should be default.

And we should also define other scenarios too: few dimensions, flat taxonomies, but with hundred thousands or millions of categories – what FacetsAccumulator/Collector (including maybe an entirely different indexing chain) suits that case?

We then document some recipes on the Wiki, and recommend the best configuration for each case.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I agree we should keep "do all aggregation at the end" ... it could be for some use-cases (sampling) it's better.

So the "aggregate as you collect" should be an option, and not necessarily the default until we can see if it's better for the "common" case.

Feel free to change the title of this issue!

asfimport commented 11 years ago

Gilad Barkai (migrated from JIRA)

Aggregating all doc ids first also make it easier to compute actual results after sampling. That is done by taking the sampling result top-(c)K and calculating their true value over all matching documents, giving the benefit of sampling and results which could make sense to the user (e.g in counting the end number would actually be the number of matching documents to this category).

As for aggregating 'on the fly' it has some other issues

It (was?) believed that accessing the counting array during query execution may lead to memory cache issues. The entire counting array could be accessed for every document over and over, and it's not guaranteed it would fit into the cache (that's the CPU's one). That might not be a problem on modern hardware
While the OS can cache all payload data itself, it gets difficult as the index grows. If the OS fails to cache the file, it is (again, was?) believed that going over the file in sequential manner once without seeks (at least by the current thread) would make it faster.

It sort of becoming a religion with all those "believes", as some scenarios used to make sense a few years ago. I'm not sure they still do. Can't wait to see how some of these co-exist with the benchmark results. If all religions could have been benchmarked... ;)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Initial prototype patch ... I created a CountingFacetsCollector that aggregates per-segment, and it "hardwires" a dgap/vint decoding.

I tested using luceneutil's date faceting and it gives decent speedups for TermQuery:

                HighTerm        0.54      (2.7%)        0.63      (1.4%)   17.6% (  13% -   22%)
                 LowTerm        7.69      (1.6%)        9.15      (2.1%)   18.9% (  14% -   23%)
                 MedTerm        3.39      (1.2%)        4.48      (1.3%)   32.2% (  29% -   35%)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

New patch, adding a hacked up CachedCountingFacetsCollector.

All it does is first pre-load all payloads into a PackedBytes (just like DocValues), and then during aggregation, instead of pulling the byte[] from payloads it pulls it from this RAM cache.

This results in an unexpectedly big speedup:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm        0.53      (0.9%)        1.00      (2.5%)   87.3% (  83% -   91%)
                 LowTerm        7.59      (0.6%)       26.75     (12.9%)  252.6% ( 237% -  267%)
                 MedTerm        3.35      (0.7%)       12.71      (9.0%)  279.8% ( 268% -  291%)

The only "real" difference is that I'm pulling the byte[] from RAM instead of from payloads, ie I still pay the vInt+dgap decode cost per hit ... so it's surprising payloads add THAT MUCH overhead? (The test was "hot" so payloads were coming from OS's IO cache via MMapDir).

I think the reason why HighTerm sees the least gains is because .advance is much less costly for it, since often the target is in the already-loaded block.

I had separately previously tested the existing int[][][] cache (CategoryListCache) but it had smaller gains than this (73% for MedTerm), and it required more RAM (1.9 GB vs 377 RAM for this patch).

Net/net I think we should offer an easy-to-use DV-backed facets impl...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Net/net I think we should offer an easy-to-use DV-backed facets impl...

If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...

The patch looks very good. Few comments/questions:

Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help.
- Preferably, if we had an AtomicReader which caches these bytes, then you wouldn't need to reuse the Collector?
- Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same?

If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:

Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().
Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.
Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.
I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.
Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.
In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?
Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only.
- For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it?

I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.

About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...

Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.

Overall though, great work Mike !

We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Changing the title, which got me thinking – Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?

Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely.

If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Net/net I think we should offer an easy-to-use DV-backed facets impl...

If only DV could handle multi-values. Can they handle a single byte[]?

Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[].

They can handle byte[], so I think we should just offer that.

Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...

Right, though in the special (common?) case where a given facet field is single-valued, like the Date facets I added to luceneutil / nightlybench (see the graph here: http://people.apache.org/\~mikemccand/lucenebench/TermDateFacets.html – only 3 data points so far!), we could also use DV's int fields and let it encode the single ord (eg with packed ints) and then aggreggate up the taxonomy after aggregation of the leaf ords is done. I'm playing with a prototype patch for this ...

Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help.

No no: this is all just a hack (the CachedBytes / static cache). We should somehow cleanly switch to DV ... it wasn't clear to me how to do that ...

Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same?

Maybe! Have to test ...

If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:

I do! If ... making it fully generic doesn't hurt perf much. The decode chain (w/ separate reInit called per doc) seems heavyish ...

Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().

Ahh ok. I'll fix that.

Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.

OK I'll try that.

Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.

I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.

OK good.

Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.

Ahh I'll do that.

Separately I was wondering if we should sometimes do aggregation backed by an int[] hashmap, and have it "upgrade" to a non-sparse array only once the number collected got too large. Not sure it's THAT important since it would only serve to keep fast queries fast but would make slow queries a bit slower...

In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?

Also for multiple threads running at once ... but it's all a hack anyway ...

Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only. For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it?

Oh good point – the DV/cache collectors can accept out of order. I'll fix.

I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.

I think we should have two new collectors here? One keeps using payloads but operates per segment and aggregates on the fly (if, on making it generic again, we still see gains).

The other stores the byte[] in DV. But somehow we have to make "send the byte[] to DV not payloads at index time" easy ... I'm not sure how :)

About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively?

Right: base = current trunk, comp = the two new collectors.

I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...

This also surprised me, but I suspect it's the per-doc pointer dereferencing that's costing us. I saw the same problem with DirectPostingsFormat ... This also ties up tons of extra RAM (pointer = 4 or 8 bytes; int[] object overhead maybe 8 bytes?). I bet if we made a single int[], and did our own addressing (eg another int[] that maps docID to its address) then that would be faster than byte[] via cache/DV.

Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.

Yeah good question. I'll separately test the specialized decode to see how much it's helping....

Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?

Right! DV vs payloads is decoupled from during- vs post-collection aggregation.

I'll open a separate issue to allow byte[] DV backing for facets....

Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely. If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.

Definitely.

Overall though, great work Mike ! We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...

Thanks! I want to see that graph jump :)

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.

But note that DV means ugrading existing indexes. How do you move from a payload to DV? Is it something that can be done in addIndexes? If facets could determine where the data is written, per-segment, the indexes will be migrated on-the-fly, as segments are merged.

But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.

If you want to simulate DVs, you'll need to implement few classes. First, instead of CategoryDocBuilder, you can constuct your own Document, while adding DVFields. Just make sure that when you resolve a CP to its ord, you also resolve all its parents and add all of them to the DV - to compare today(payload) to today(DV) (today == writing all parents).

Then, I think that you should also write your CategoryListIterator, to iterate on the DV.

Those are the base classes for sure, maybe you'll need a few others to get the CLI into the chain.

I hope that I related to all the comments, but I might have missed a question :).

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Another point about DV - that's actually a design thing. One important hook is IntEncoder/Decoder. It determines how the fulltree is encoded/decoded. For example, you used one method (VInt+DGap), but there are other encoders. In one application, every document added almost unique facets and so the ordinals returned had a gap of 1-2. Therefore we have a FourOnes and EightOnes encoders.

Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.

+1, though we should test the on-disk DV vs current payloads to be sure.

But note that DV means ugrading existing indexes.

Hmm it would be nice to somehow migrate on the fly ... not sure how.

But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.

If we do the migrate-on-the-fly then users can use IndexUpgrader to migrate entire index.

Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.

+1, the abstractions are nice an generic.

I'll test to see how much these abstraction are hurting the hotspots ... we can always make/pick specialized collectors (like the patch) if necessary, and keep generic collectors for the fully general cases ...

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I created #5667 to cutover to DV.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.

I tried this, changing the CountingFacetsCollector to the attached patch (to use CategoryListIterator), but alas those abstractions are apparently costing us in this hotspot (unless I screwed something up in the patch? Eg, that null I pass is kinda spooky!):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm        0.86      (4.7%)        0.56      (0.4%)  -34.4% ( -37% -  -30%)
                 MedTerm        5.85      (1.0%)        5.04      (0.5%)  -13.9% ( -15% -  -12%)
                 LowTerm       11.82      (0.6%)       11.02      (0.5%)   -6.8% (  -7% -   -5%)

base is the original CountingFacetsCollector and comp is the new one using the CategoryListIterator API.

I think we should try to invoke specialized collectors when possible?

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Maybe you should take the IntArrayAllocator from the outside?

This actually makes me sort of nervous, because if the app passes 10 to IntArrayAllocator, it means we hold onto 10 int[] sized to the number of ords right?

Why try to recycle the int[]'s? Why not let GC handle it...?

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Why try to recycle the int[]'s? Why not let GC handle it...?

It was Gilad who mentioned "believes" and "religions" .. that code is written since Java 1.4. Not sure that at the time Java was very good at allocating and disposing arrays ... Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...

Perhaps leave it for now, and separately (new issue? :)) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...

Actually we stopped recycling with DWPT ... now we let GC do its job. But, also, when IW did this, it was internal (no public API was affected) ... I don't like that the app can/should pass in IntArrayAllocator to the public APIs.

Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?

OK I'll open a new issue. Rather than adding a cleanup thread to the current impl, I think we should remove Int/FloatArrayAllocator and just do new int[]/float[]? And only add it back if we can prove there's a performance gain? I think we should let Java/GC do its job ...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok, let's continue the discussion on #5680. The Allocator is also used to pass the array between different objects, but perhaps there are other ways too.

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch introduces CountingFacetsCollector, very similar to Mike's version, only "productized".

Made FacetsCollector abstract with a utility create() method which returns either CountingFacetsCollector or StandardFacetsCollector (previously, FC), given the parameters.

All tests were migrated to use FC.create and all pass (utilizing the new collector). Still, I wrote a dedicated test for the new Collector too.

Preliminary results that we have, show nice improvements w/ this Collector. Mike, can you paste them here?

There are some nocommits, which I will resolve before committing. But before that, I'd like to compare this Collector to ones that use different abstractions from the code, e.g. IntDecoder (vs hard-wiring to dgap+vint), CategoryListIterator etc.

Also, I also want to compare this Collector to one that in collect() marks a bitset, and does all the work in getFacetResults.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks great: +1

And this is a healthy speedup, on the Wikipedia 1M / 25 ords per doc test:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                PKLookup      239.18      (1.5%)      238.87      (1.1%)   -0.1% (  -2% -    2%)
                 LowTerm       98.99      (3.1%)      135.95      (1.8%)   37.3% (  31% -   43%)
                HighTerm       20.95      (1.2%)       29.08      (2.4%)   38.8% (  34% -   42%)
                 MedTerm       34.55      (1.5%)       48.31      (2.0%)   39.8% (  35% -   43%)

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

handle some nocommits. Now there's no translation from OrdinalValue to FRNImpl in getFacetResults (the latter is used directly in the queue). I wonder if this buys us anything.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

It's faster!

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                PKLookup      239.75      (1.2%)      237.59      (1.0%)   -0.9% (  -3% -    1%)
                HighTerm       21.21      (1.5%)       29.80      (2.6%)   40.5% (  35% -   45%)
                 MedTerm       34.90      (1.9%)       50.24      (1.9%)   44.0% (  39% -   48%)
                 LowTerm       99.85      (3.7%)      152.40      (1.1%)   52.6% (  46% -   59%)

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch adds two Collectors:

DecoderCountingFacetsCollector, which uses the IntDecoder abstraction (but the rest is like CountingFacetsCollector)
PostCollectionCountingFacetsCollector, which moves the work from collect() to getFacetResults(). In collect(), it keeps a per-DocValues.Source bits (FixedBitSet) of the matching docs.

I wonder how these two compare to CountingFacetsCollector. I modified FacetsCollector.create() to return any of the 3, so just make sure to comment out the irrelevant ones in the benchmark.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Base = DecoderCountingFacetsCollector; comp=CountingFacetsCollector:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       25.67      (1.6%)       30.45      (1.9%)   18.6% (  14% -   22%)
                 LowTerm      145.87      (1.0%)      154.38      (0.8%)    5.8% (   4% -    7%)
                 MedTerm       44.45      (1.4%)       51.01      (1.5%)   14.8% (  11% -   17%)
                PKLookup      240.08      (0.9%)      239.94      (1.0%)   -0.1% (  -1% -    1%)

So it seems like the IntDecoder abstractions hurt ...

Base = DecoderCountingFacetsCollector; comp=PostCollectionCountingFacetsCollector:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       30.46      (0.8%)       30.16      (2.1%)   -1.0% (  -3% -    2%)
                 LowTerm      142.89      (0.5%)      153.94      (0.8%)    7.7% (   6% -    9%)
                 MedTerm       50.46      (0.8%)       50.65      (1.8%)    0.4% (  -2% -    2%)
                PKLookup      238.65      (1.1%)      238.55      (0.9%)   -0.0% (  -2% -    2%)

This is very interesting! And good news for sampling?

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint specialization is simple, specializing e.g. a packed-ints (or whatever other block encoding algorithm we'll come up with on LUCENE-4609) will make the code uglier :).

It looks like PostCollection doesn't hurt much? Can you compare it to Counting directly? I'm confused by the results ... they seem to improve the Decoder collector, but not sure how it will match to Counting. If the differences are miniscule (to any direction), then it could mean good news to sampling, because then we will be able to fold in sampling to this specialized Collector. But it would also mean that we can fold in complements (TotalFacetCounts).

So it looks like using any abstraction will hurt us. I didn't even try Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. the Collector will decode into an IntsRef, not using IntDecoder, and then delegate to Aggregator) – seems useless to me, as counting + default decoding are the common scenario that we want to target.

Based on the Counting vs PostCollection results, we should decide whether to always do post-collection in Counting, or not. Folding in Sampling and Complements should be done separately, because they are not so easy to bring in w/ the current state of the API.

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Hmm, it occurred to me that maybe your second comparison was between PostCollection and Counting? If so, then while it's indeed interesting, it's puzzling. PostCollection allocates FixedBitSet for every segment and in the end obtains a DISI from each FBS. As much as I know, DISIs over bitsets are not so cheap, especially when nextDoc() is called, because they need to find the next set bit ... if indeed it's faster, we must get to the bottom of it. It could mean other Collector could benefit from such post-collection technique ...

While on that, is the best way to iterate on a bitset's set bits via DISI? I'm looking at OpenBitSetDISI.nextDoc() and it looks much more expensive than FixedBitSet.nextSetBit(). I modified PostCollection to do:

while (doc < length && (doc = bits.nextSetBit(doc)) != -1) {
  .. the previous code
  ++doc;
}

And all tests pass with this change too. I wonder if that's faster than DISI.

BTW, while making this change I noticed that I have a slight inefficiency in all 3 Collectors. If the document has not facets, I should have returned, but I forgot the return statement, e.g.:

    if (buf.length == 0) {
      // this document has no facets
      return; // THAT LINE WAS MISSING!
    }

The code is still correct, just doing some redundant extra instructions. I'll upload an updated patch, with both changes shortly.

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch fixes the missing return statement in all 3 collectors, as well as moves from DISI to nextSetBit.

Mike, is it possible to compare Counting and PostCollection to trunk, instead of to each other?

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Can you compare it to Counting directly?

Ugh, sorry, that is in fact what I ran but I put the wrong base/comp above it. The test was actually base = PostCollectionCountingFacetsCollector, comp = CountingFacetsCollector.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

StandardFacetsCollector (base) vs DecoderCountingFacetsCollector (comp):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       21.44      (1.4%)       25.71      (1.3%)   19.9% (  16% -   22%)
                 LowTerm       99.73      (3.2%)      145.71      (1.2%)   46.1% (  40% -   52%)
                 MedTerm       35.13      (1.6%)       44.46      (1.1%)   26.6% (  23% -   29%)
                PKLookup      241.15      (1.0%)      238.90      (1.0%)   -0.9% (  -2% -    1%)

StandardFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       21.26      (0.9%)       31.36      (1.4%)   47.5% (  44% -   50%)
                 LowTerm       99.84      (3.2%)      159.17      (0.7%)   59.4% (  53% -   65%)
                 MedTerm       34.91      (1.3%)       52.65      (1.2%)   50.8% (  47% -   54%)
                PKLookup      238.08      (1.3%)      238.26      (1.2%)    0.1% (  -2% -    2%)

StandardFacetsCollector (base) vs CountingFacetsCollector (comp):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       21.35      (1.3%)       30.26      (2.9%)   41.7% (  37% -   46%)
                 LowTerm      100.45      (4.0%)      153.26      (1.1%)   52.6% (  45% -   60%)
                 MedTerm       35.02      (1.9%)       50.77      (2.0%)   45.0% (  40% -   49%)
                PKLookup      237.88      (2.4%)      239.34      (0.9%)    0.6% (  -2% -    4%)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I re-ran CountingFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighTerm       30.15      (1.4%)       30.97      (1.1%)    2.7% (   0% -    5%)
                 LowTerm      153.06      (0.4%)      158.26      (0.7%)    3.4% (   2% -    4%)
                 MedTerm       50.69      (0.9%)       52.29      (0.9%)    3.2% (   1% -    5%)
                PKLookup      238.04      (1.3%)      236.79      (1.8%)   -0.5% (  -3% -    2%)

I think the cutover away from DISI made it faster ... and it's surprising this (allocate bit set, set the bits, revisit the set bits in the end) is faster than count-as-you-go.

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

I'm surprised too. Throwing off a wild idea, maybe the post collection buys us a locality of reference in terms of the counts[] (and maybe even DocValues.Source?

It almost feels counter-intuitive, right? CountingFC's operations are a subset of PostCollectionCFC. The latter adds many bitwise operations, ifs, loops and what not. So what do we do? Stick w/ post-collection? :)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I ran the same test, but w/ the full set of query categories:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
              AndHighLow      111.98      (1.0%)      110.10      (1.0%)   -1.7% (  -3% -    0%)
            HighSpanNear      128.42      (1.4%)      126.32      (1.1%)   -1.6% (  -4% -    0%)
             LowSpanNear      128.68      (1.4%)      126.59      (1.0%)   -1.6% (  -3% -    0%)
             MedSpanNear      128.18      (1.3%)      126.29      (1.1%)   -1.5% (  -3% -    0%)
                 Respell       55.79      (3.9%)       55.35      (4.8%)   -0.8% (  -9% -    8%)
                PKLookup      206.89      (1.1%)      208.08      (1.5%)    0.6% (  -2% -    3%)
                  Fuzzy2       36.21      (1.3%)       36.49      (2.3%)    0.8% (  -2% -    4%)
               MedPhrase       56.42      (1.4%)       56.94      (1.3%)    0.9% (  -1% -    3%)
                Wildcard       64.26      (3.8%)       64.88      (2.0%)    1.0% (  -4% -    7%)
              AndHighMed       51.80      (0.7%)       52.44      (1.2%)    1.2% (   0% -    3%)
                  IntNRQ       18.49      (4.8%)       18.78      (5.5%)    1.6% (  -8% -   12%)
                 LowTerm       41.15      (0.6%)       41.82      (0.9%)    1.6% (   0% -    3%)
                 Prefix3       46.94      (4.3%)       47.92      (3.4%)    2.1% (  -5% -   10%)
                 MedTerm       18.47      (0.8%)       18.92      (1.3%)    2.4% (   0% -    4%)
              HighPhrase       15.16      (6.2%)       15.77      (4.3%)    4.0% (  -6% -   15%)
                HighTerm        6.76      (1.2%)        7.07      (1.2%)    4.5% (   2% -    7%)
         LowSloppyPhrase       17.14      (3.8%)       17.96      (2.3%)    4.8% (  -1% -   11%)
                  Fuzzy1       27.29      (0.8%)       28.62      (1.4%)    4.9% (   2% -    7%)
         MedSloppyPhrase       17.64      (2.4%)       18.90      (1.0%)    7.2% (   3% -   10%)
             AndHighHigh       11.11      (0.5%)       11.97      (0.9%)    7.7% (   6% -    9%)
        HighSloppyPhrase        0.83     (10.5%)        0.91      (5.9%)   10.1% (  -5% -   29%)
               LowPhrase       15.83      (3.2%)       17.45      (0.2%)   10.2% (   6% -   14%)
              OrHighHigh        3.22      (0.7%)        3.80      (1.5%)   18.1% (  15% -   20%)
               OrHighLow        5.68      (0.3%)        6.73      (1.5%)   18.4% (  16% -   20%)
               OrHighMed        5.61      (0.5%)        6.66      (1.6%)   18.7% (  16% -   20%)

Somehow post-collection is a big gain for the Or queries ... I wonder if somehow we are not getting the out of order scorer (BooleanScorer) w/ CountingCollector ... but looking at both collectors they both return true from acceptsDocsOutOfOrder ...

Net/net it seems like we should stick with post collection? The possible downside is memory use of the temporary bit set I guess ...

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I confirmed that the Or queries are using BooleanScorer in both base and comp, so those gains are "real".

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Results if I rebuild the index with NO_PARENTS (just to make sure the locality gains are not due to frequently visiting the parent ords in the count array):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                 Respell       55.59      (3.9%)       54.45      (3.4%)   -2.0% (  -8% -    5%)
                  IntNRQ       18.34      (7.1%)       18.04      (6.4%)   -1.7% ( -14% -   12%)
              AndHighLow       86.87      (0.6%)       86.26      (1.9%)   -0.7% (  -3% -    1%)
             MedSpanNear       97.31      (0.9%)       96.63      (1.8%)   -0.7% (  -3% -    1%)
                 Prefix3       46.40      (5.6%)       46.11      (4.6%)   -0.6% ( -10% -   10%)
             LowSpanNear       97.76      (0.9%)       97.28      (1.8%)   -0.5% (  -3% -    2%)
                  Fuzzy2       31.88      (1.6%)       31.77      (2.7%)   -0.3% (  -4% -    3%)
                Wildcard       62.53      (2.9%)       62.34      (2.5%)   -0.3% (  -5% -    5%)
                PKLookup      210.69      (1.5%)      210.37      (1.8%)   -0.1% (  -3% -    3%)
            HighSpanNear       97.44      (1.4%)       97.35      (1.7%)   -0.1% (  -3% -    3%)
               MedPhrase       49.87      (2.4%)       50.18      (2.5%)    0.6% (  -4% -    5%)
              HighPhrase       14.32      (8.8%)       14.42      (8.8%)    0.7% ( -15% -   20%)
                 LowTerm       37.64      (0.5%)       37.90      (1.3%)    0.7% (  -1% -    2%)
              AndHighMed       45.23      (0.6%)       45.74      (1.1%)    1.1% (   0% -    2%)
                 MedTerm       22.53      (1.0%)       23.00      (1.3%)    2.1% (   0% -    4%)
         LowSloppyPhrase       16.27      (2.5%)       16.65      (5.7%)    2.3% (  -5% -   10%)
                  Fuzzy1       24.86      (1.7%)       25.87      (1.4%)    4.1% (   0% -    7%)
                HighTerm        7.67      (1.6%)        8.00      (2.4%)    4.3% (   0% -    8%)
         MedSloppyPhrase       16.67      (1.2%)       17.58      (3.1%)    5.5% (   1% -    9%)
        HighSloppyPhrase        0.81      (6.6%)        0.86     (12.8%)    6.9% ( -11% -   28%)
             AndHighHigh       11.38      (0.8%)       12.18      (1.2%)    7.1% (   5% -    9%)
               LowPhrase       14.69      (4.7%)       15.82      (5.7%)    7.6% (  -2% -   18%)
              OrHighHigh        3.60      (2.3%)        4.32      (3.3%)   20.0% (  14% -   26%)
               OrHighMed        6.20      (1.9%)        7.51      (3.0%)   21.1% (  15% -   26%)
               OrHighLow        6.25      (2.0%)        7.60      (2.4%)   21.7% (  17% -   26%)

So net/net post is still better! Separately it looks like NO_PARENTS is maybe \~10% faster for the high-cost queries, but slower for the low cost queries ... which is expected because iterating over 2.2 M ords in the end is a fixed non-trivial cost ...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Good. So I'll consolidate Post and Counting into one, an also add handling for NO_PARENTS case. Unfortunately, we cannot compare trunk vs patch for the NO_PARENTS case, unless we write a lot of redundant code (e.g. a NoParentsAccumulator). We'll have to suffice w/ the absolute QPS numbers I guess, which is about 12% improvements.

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Patch finalizes CountingFacetsCollector to handle the specialized case of facets counting, doing the counting in post-aggregation. Also, it can handle OrdinalPolicy.NO_PARENTS, allowing to index only leaf ordinals, and counting up the parents after the leafs' counts have been resolved.

Added a CHANGES entry, and updated some javadocs.

Would be good if we can give this version a final comparison against trunk. For the ALL_PARENTS case, we can compare the pct diff, while for NO_PARENTS we can only compare absolute QPS for now.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp):

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                 Respell       55.89      (3.2%)       55.13      (3.9%)   -1.4% (  -8% -    5%)
                PKLookup      207.52      (1.6%)      206.95      (1.4%)   -0.3% (  -3% -    2%)
                Wildcard       62.22      (3.2%)       62.94      (2.7%)    1.2% (  -4% -    7%)
                  IntNRQ       17.88      (5.2%)       18.16      (5.7%)    1.6% (  -8% -   13%)
                 Prefix3       45.56      (4.9%)       46.48      (4.1%)    2.0% (  -6% -   11%)
        HighSloppyPhrase        0.80      (9.7%)        0.84      (8.5%)    4.9% ( -12% -   25%)
              HighPhrase       13.52      (7.7%)       15.09      (8.1%)   11.6% (  -3% -   29%)
         LowSloppyPhrase       15.02      (3.9%)       17.15      (4.0%)   14.1% (   5% -   22%)
               LowPhrase       14.14      (4.3%)       16.77      (4.9%)   18.6% (   8% -   29%)
         MedSloppyPhrase       14.81      (2.6%)       18.33      (2.7%)   23.7% (  17% -   29%)
                  Fuzzy2       27.57      (2.6%)       34.95      (3.1%)   26.8% (  20% -   33%)
             AndHighHigh        9.39      (1.6%)       11.92      (1.4%)   27.0% (  23% -   30%)
                 MedTerm       14.63      (2.2%)       18.89      (1.7%)   29.1% (  24% -   33%)
                HighTerm        5.28      (1.8%)        7.02      (2.4%)   33.0% (  28% -   37%)
                  Fuzzy1       20.79      (2.1%)       27.71      (2.8%)   33.3% (  27% -   39%)
               OrHighLow        4.82      (1.8%)        6.70      (2.6%)   39.1% (  34% -   44%)
               OrHighMed        4.74      (1.8%)        6.61      (3.0%)   39.4% (  34% -   44%)
              OrHighHigh        2.68      (1.8%)        3.77      (2.9%)   40.9% (  35% -   46%)
               MedPhrase       39.21      (3.6%)       55.35      (3.6%)   41.2% (  32% -   50%)
              AndHighMed       36.29      (3.5%)       51.92      (2.0%)   43.1% (  36% -   50%)
                 LowTerm       27.96      (3.2%)       41.47      (2.2%)   48.3% (  41% -   55%)
              AndHighLow       64.36      (5.4%)      107.94      (5.7%)   67.7% (  53% -   83%)
             MedSpanNear       70.17      (6.1%)      123.23      (7.4%)   75.6% (  58% -   94%)
             LowSpanNear       70.35      (6.0%)      123.59      (7.1%)   75.7% (  58% -   94%)
            HighSpanNear       70.35      (6.1%)      123.69      (7.8%)   75.8% (  58% -   95%)

These are nice gains!

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

NO_PARENTS CountingFacetsCollector vs itself (ie all differences are noise). Use the absolute QPS to compare to the "QPS comp" column above, eg MedTerm was 18.89 QPS above with ALL_PARENTS and with NO_PARENTS MedTerm is 22.67-22.80 QPS:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
              AndHighLow       85.20      (5.0%)       83.74      (5.7%)   -1.7% ( -11% -    9%)
             LowSpanNear       95.25      (5.5%)       93.67      (6.8%)   -1.7% ( -13% -   11%)
            HighSpanNear       95.19      (5.4%)       93.80      (6.7%)   -1.5% ( -12% -   11%)
             MedSpanNear       94.97      (5.4%)       93.59      (6.8%)   -1.5% ( -12% -   11%)
              AndHighMed       45.68      (2.8%)       45.29      (2.9%)   -0.9% (  -6% -    4%)
               OrHighLow        7.62      (2.2%)        7.55      (2.2%)   -0.8% (  -5% -    3%)
              OrHighHigh        4.33      (2.2%)        4.29      (2.2%)   -0.8% (  -5% -    3%)
                 LowTerm       38.17      (2.0%)       37.90      (2.2%)   -0.7% (  -4% -    3%)
               OrHighMed        7.54      (2.2%)        7.49      (2.1%)   -0.7% (  -4% -    3%)
                 Prefix3       45.95      (4.3%)       45.68      (4.4%)   -0.6% (  -8% -    8%)
                 MedTerm       22.80      (2.2%)       22.67      (2.1%)   -0.6% (  -4% -    3%)
                  Fuzzy1       26.16      (1.9%)       26.04      (2.0%)   -0.4% (  -4% -    3%)
                  IntNRQ       17.94      (6.1%)       17.86      (6.2%)   -0.4% ( -11% -   12%)
             AndHighHigh       12.33      (1.2%)       12.29      (1.3%)   -0.4% (  -2% -    2%)
                  Fuzzy2       32.00      (2.8%)       31.89      (3.0%)   -0.3% (  -5% -    5%)
               MedPhrase       49.48      (3.9%)       49.32      (4.4%)   -0.3% (  -8% -    8%)
                HighTerm        8.02      (2.1%)        8.00      (2.0%)   -0.2% (  -4% -    3%)
                PKLookup      211.76      (1.4%)      211.32      (1.8%)   -0.2% (  -3% -    3%)
                Wildcard       62.37      (2.3%)       62.28      (2.3%)   -0.1% (  -4% -    4%)
         MedSloppyPhrase       17.49      (2.5%)       17.52      (2.7%)    0.2% (  -4% -    5%)
                 Respell       55.68      (5.0%)       55.85      (3.3%)    0.3% (  -7% -    9%)
         LowSloppyPhrase       16.29      (4.7%)       16.43      (5.2%)    0.9% (  -8% -   11%)
               LowPhrase       15.68      (5.3%)       15.81      (5.4%)    0.9% (  -9% -   12%)
              HighPhrase       14.22      (8.7%)       14.45      (8.9%)    1.6% ( -14% -   21%)
        HighSloppyPhrase        0.83      (9.3%)        0.85     (11.9%)    2.1% ( -17% -   25%)

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Also, total _*_dv.* file size is 445 MB for ALL_PARENTS and 351 MB for NO_PARENTS.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

base = ALL_PARENTS, comp = NO_PARENTS:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
             MedSpanNear      125.77      (2.0%)       79.31      (0.8%)  -36.9% ( -38% -  -34%)
             LowSpanNear      124.86      (2.7%)       79.23      (0.5%)  -36.5% ( -38% -  -34%)
            HighSpanNear      124.23      (2.3%)       79.44      (0.8%)  -36.1% ( -38% -  -33%)
              AndHighLow      107.24      (1.4%)       72.70      (0.7%)  -32.2% ( -33% -  -30%)
               MedPhrase       55.98      (0.6%)       44.89      (1.4%)  -19.8% ( -21% -  -17%)
              AndHighMed       52.06      (0.7%)       43.20      (0.0%)  -17.0% ( -17% -  -16%)
                  Fuzzy2       35.71      (0.6%)       30.42      (1.6%)  -14.8% ( -16% -  -12%)
               LowPhrase       17.27      (0.3%)       15.21      (3.2%)  -11.9% ( -15% -   -8%)
              HighPhrase       15.20      (6.2%)       13.50      (4.7%)  -11.2% ( -20% -    0%)
                 LowTerm       41.68      (0.4%)       37.49      (0.4%)  -10.1% ( -10% -   -9%)
         LowSloppyPhrase       17.31      (2.9%)       15.75      (0.9%)   -9.0% ( -12% -   -5%)
                  Fuzzy1       28.11      (0.3%)       25.63      (0.0%)   -8.8% (  -9% -   -8%)
         MedSloppyPhrase       18.42      (1.5%)       17.25      (0.1%)   -6.3% (  -7% -   -4%)
                 Respell       56.32      (0.3%)       54.41      (2.2%)   -3.4% (  -5% -    0%)
        HighSloppyPhrase        0.83      (6.8%)        0.81      (1.0%)   -2.3% (  -9% -    5%)
                Wildcard       63.43      (1.9%)       61.96      (0.3%)   -2.3% (  -4% -    0%)
                 Prefix3       45.60      (0.5%)       45.70      (0.7%)    0.2% (  -1% -    1%)
                  IntNRQ       17.54      (0.6%)       17.60      (1.4%)    0.3% (  -1% -    2%)
                PKLookup      205.89      (0.5%)      210.73      (0.7%)    2.4% (   1% -    3%)
             AndHighHigh       11.89      (0.2%)       12.48      (0.3%)    5.0% (   4% -    5%)
                HighTerm        7.00      (0.2%)        8.09      (0.1%)   15.6% (  15% -   16%)
              OrHighHigh        3.77      (0.6%)        4.36      (0.3%)   15.6% (  14% -   16%)
               OrHighLow        6.65      (0.1%)        7.69      (1.5%)   15.6% (  14% -   17%)
               OrHighMed        6.61      (0.4%)        7.66      (0.2%)   15.8% (  15% -   16%)
                 MedTerm       18.86      (0.4%)       22.13      (0.4%)   17.3% (  16% -   18%)

I think because this test has 2.5M ords ... the cost of "rolling up" in the end is non-trivial ...

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks for running this. I think that given these results, making NO_PARENTS the default policy is not that good. I anyway think it's not a good default, because it forces the user to stop and think if the documents that he'll index share or not parents. This looks like an advanced setting to me, i.e. if you want to get "expert" and really know your content, then you can choose to index like so. Plus, given those statistics, I'd say that you have to test before you go to production with it (i.e. looks like it may be expensive as the number of ordinals grow...).

Mike found a bug in how I count up the parents in the NO_PARENTS case, so I fixed it (and added a test). I'll run tests a couple of times and commit this.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

The performance depends heavily on how many ords your taxo index has ... my last test was \~2.5M ords, but when I build an index leaving out the two dimensions (categories, username) with the most ords, leaving 4703 unique ords, the numbers are much better:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                 Prefix3      161.48      (6.1%)      161.99      (7.4%)    0.3% ( -12% -   14%)
                PKLookup      235.50      (2.4%)      236.41      (2.1%)    0.4% (  -4% -    5%)
                 Respell       85.41      (4.4%)       85.92      (4.2%)    0.6% (  -7% -    9%)
              AndHighLow     1196.56      (2.1%)     1204.67      (3.4%)    0.7% (  -4% -    6%)
                  IntNRQ      104.88      (6.7%)      105.77      (9.0%)    0.9% ( -13% -   17%)
                Wildcard      215.17      (2.2%)      217.13      (2.6%)    0.9% (  -3% -    5%)
        HighSloppyPhrase        3.24      (8.2%)        3.27      (9.2%)    1.0% ( -15% -   19%)
             LowSpanNear       42.80      (3.0%)       43.68      (2.8%)    2.1% (  -3% -    8%)
                  Fuzzy2       84.83      (3.6%)       86.70      (2.8%)    2.2% (  -4% -    8%)
            HighSpanNear       11.42      (1.9%)       11.70      (2.3%)    2.4% (  -1% -    6%)
               LowPhrase       71.69      (6.8%)       73.91      (6.2%)    3.1% (  -9% -   17%)
                  Fuzzy1       75.53      (3.4%)       78.81      (2.7%)    4.3% (  -1% -   10%)
              HighPhrase       42.58     (11.4%)       44.61     (11.5%)    4.8% ( -16% -   31%)
         LowSloppyPhrase       80.22      (2.3%)       84.49      (3.1%)    5.3% (   0% -   10%)
             MedSpanNear       85.37      (1.9%)       91.16      (1.8%)    6.8% (   3% -   10%)
         MedSloppyPhrase       86.55      (2.7%)       92.84      (3.2%)    7.3% (   1% -   13%)
               MedPhrase      145.23      (5.6%)      156.11      (6.1%)    7.5% (  -3% -   20%)
              AndHighMed      321.74      (1.2%)      346.20      (1.5%)    7.6% (   4% -   10%)
             AndHighHigh       84.28      (1.6%)       96.80      (1.7%)   14.9% (  11% -   18%)
              OrHighHigh       35.03      (2.9%)       42.53      (4.6%)   21.4% (  13% -   29%)
               OrHighMed       51.75      (3.0%)       63.90      (4.6%)   23.5% (  15% -   32%)
               OrHighLow       50.41      (3.0%)       62.51      (4.7%)   24.0% (  15% -   32%)
                HighTerm       58.55      (3.0%)       74.59      (4.2%)   27.4% (  19% -   35%)
                 LowTerm      355.14      (1.6%)      480.44      (2.3%)   35.3% (  30% -   39%)
                 MedTerm      206.44      (2.0%)      286.54      (3.1%)   38.8% (  33% -   44%)

I also separately fixed a silly bug in luceneutil which was causing the Span queries to get 0 hits.

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436435

LUCENE-4600: add CountingFacetsCollector

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436446

LUCENE-4600: add CountingFacetsCollector

asfimport commented 11 years ago

Shai Erera (@shaie) (migrated from JIRA)

Committed to trunk and 4x. Let's see if it makes nightly happy! :)