BooleanScorer should sometimes be used for MUST clauses [LUCENE-4396] - Githubissues

apache / lucene

Apache Lucene open-source search software

https://lucene.apache.org/

Apache License 2.0

2.64k stars 1.03k forks source link

BooleanScorer should sometimes be used for MUST clauses [LUCENE-4396] #5462

Open asfimport opened 12 years ago

asfimport commented 12 years ago

Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.

If there is one or more MUST clauses we always use BooleanScorer2.

But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count).

Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 1000000 docs then you want to .advance() all the SHOULD clauses.

I won't have near term time to work on this so feel free to take it if you are inspired!

Migrated from LUCENE-4396 by Michael McCandless (@mikemccand), updated Aug 18 2014 Attachments: all.perf, And.tasks (versions: 3), AndOr.tasks (versions: 2), LUCENE-4396.patch (versions: 16), LUCENE-4396-simple.patch (versions: 4), luceneutil-score-equal.patch (versions: 2), merge.perf, merge.png, merge-simple.perf, merge-simple.png, perf.png, SIZE.perf, stat.cpp (versions: 2), tasks.cpp (versions: 2)

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

There's a GSoC proposal for this: http://www.google-melange.com/gsoc/proposal/review/org/google/gsoc2014/dhuang/5629499534213120

I'm happy to be a mentor for this project.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

I have just set the GSoC proposal's visibility as Public, and the public URL is this: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/dhuang/5629499534213120

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

I'm revising and polishing my proposal these days, and I have discovered an interesting thing. That is: if BooleanScorer supports required scorers in the way I have proposed, docIDs would be in acsending order in the bucket table. I think this can make BooleanScorer be a Not-Top Scorer, as .advance() .docID() .nextDoc() etc. can be implemented. However, I'm not sure how it would affect the performance when it acts as a Not-Top Scorer. This is because when .nextDoc() or .advance() is called, BooleanScorer may calculate a 2K window whose data may not be all useful.

I hope I have made my idea clear.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Da, that's a great discovery.

So, in the case where at least one MUST clause is present, BS will in fact collect in-order, and then BS could be embedded in other queries that want a sub-scorer.

This may force us to more strongly separate the notion of "force doc-at-a-time scoring" (#3758), since today the "sneaky" way to do this is return false from your Collector.acceptsDocsOutOfOrder.

I think you should be careful in your proposal to keep this issue well-scoped. I.e., the overall goal is to let BS handle MUST clauses in certain causes (heuristic needs to decide this), and then a nice-to-have is to enable BS too also be a sub-scorer in some cases.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Hi, Mike.

I have just finished revising my proposal. I'm not so sure about the decription on this page "unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2.".

In my opinion, even when MUST clauses have very low hit count compared to the other clauses, BooleanScorer is likely to perform better than BooleanScorer2, because the calling on .advance() when dealing with SHOULD clauses can skip documents as many as BooleanScorer2 does.

Relevant ideas is described in the session "Improve the Rule for Choosing Scorer". As it's not very consistent with the description on this page, I'm not sure whether my idea makes sense.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Da, the new iteration on the proposal looks awesome; I left a comment there!

You're right: if we do use .advance on the sub-scorers, then even in the case where a conjunction clause has very low cost, BS should still be competitive.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Sorry for my late reply. I have been thinking about the new code/design on the trunk these days.

The new code breaks out BulkScorer from Scorer, and it is necessary to create a new BooleanScorer (a Scorer), just as you said. I'm afraid that we do have to take Scorer instead as subScorer in the new BooleanScorer. And yes: BooleanBulkScorer should not be embeded as its docIDs are out of order. My idea is to keep BooleanBulkScorer just supporting no-MUST-clause case, and let the new BooleanScorer to deal with the case where there is at least one MUST clause. I think this is one of the best ways to be compatible with the current design.

Besides, I'm afraid that the name of BulkScorer may be confusing. The new BooleanScorer is also implemented by scoring a range of documents at once, but it actually can act as Sub-Scorer.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Using BooleanScorer (a Scorer) when there is one or more MUST makes sense I think, but we need to test perf. It could be letting BooleanBulkScorer also handle MUST gives a good performance gain, in which case we could let both handle it ...

Besides, I'm afraid that the name of BulkScorer may be confusing.

That's a good point ... for a while on that issue we had the name TopScorer ... maybe we need to revisit that :)

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

I'm afraid that if BooleanBulkScorer also handle MUST, it couldn't make use of .advance(), as its subScorers are BulkScorer which could not call .advance().

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

A new iteration on the proposal has just been submitted. The new iteration has added a part "Supplementary Notes" to describe how to fit my design to the new design on the current lucene trunk, such as renaming BooleanScorer to BooleanBulkScorer, creating a new BooleanScorer extended from Scorer.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm afraid that if BooleanBulkScorer also handle MUST, it couldn't make use of .advance(), as its subScorers are BulkScorer which could not call .advance().

That's true ... but then in some cases I think using advance will hurt, e.g. MUST of a very common term with SHOULD of rare term(s). And if the bulk scorer really is the top scorer, advance isn't needed. Tricky :)

The supplemental section in the proposal looks good! I put a small comment on the proposal...

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Oh, that's very true. According to the current design, using advance will hurt in such case.

However, I think such case may be able to be solved by building a skipping list on MUST-all-hit bulks and skipping MUST-all-hit bulks when scanning SHOULD, but I haven't made this idea very clear in my mind. So making BooleanBulkScorer is still necessary now.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

BooleanScorer can support MUST clause (ie. requiredScorers) now. The patch is based on commit 9e87821edeb3e24ca8dedaecf856f6510d61d0d3 on github.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Currently, I pass "List<Scorer> requiredScorers" to BooleanScorer, and merge them as ConjunctionScorer.

For consistency, I should probably change the argument from "List<Scorer> requiredScorers" to "List<BulkScorer> requiredScorers", but, as a result, getScorer method should be added to BulkScorer. Besides, I removed the static statement on BooleanScorerCollector and BucketTable, because I have to refer the member "requiredNrMatchers" of BooleanScorer. But, I'm so sure whether removing the static statement is a proper option.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

This looks great! Tests pass for me.

Changing from static so you can access requiredNrMatchers seems fine; we could also pass that into the class & save it. Either way ...

The comment for REQUIRED_MASK needs fixing.

Maybe add a comment where we return null if requiredSubScorer is null explaining why?

Hmm that pre-existing comment "// Check if we can and should return a BooleanScorer" is wrong ... I'll fix.

Small styling issue: we don't put whitespace after ( and before ), e.g.:

    if ( this.requiredNrMatchers > 0 ) {

should be this instead:

    if (this.requiredNrMatchers > 0) {

Maybe change:

     if (current.coord >= minNrShouldMatch + requiredNrMatchers) {

to:

     if (current.coord - requiredNrMatchers >= minNrShouldMatch) {

And add a comment saying "minNrShouldMatch only applies to SHOULD clauses", or something? Just to make the math more obvious :)

For consistency, I should probably change the argument from "List<Scorer> requiredScorers" to "List<BulkScorer> requiredScorers", but, as a result, getScorer method should be added to BulkScorer.

Hmm in general a BulkScorer need not implement a Scorer under the hood (DefaultBulkScorer does, because it wraps, but e.g. BooleanScorer doesn't).

Or alternatively if you pass List<BulkScorer> you could handle all the conjunctions yourself in BooleanScorer, e.g. make different collectors for them that add up a "mustClauseCountMatches" counter (instead of setting the 1 bit mask), and then check if that counts is >= requiredScorers.size() before counting it as a hit...

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

These suggestions are very helpful. Thank you.:)

Adding up a "mustClauseCountMatches" counter would be low-efficient, as it can not make use of list skippings. How about implementing getScorer returning null for BulkScorer, while returning the scorer for DefaultBulkScorer. I'm not very sure whether passing List<BulkScorer> instead of List<Scorer> is really necessary. So I think this issue should probably be just set aside for now.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

I create a new class named BooleanNovelScorer in this iteration. This scorer is based on the techinque of BooleanScorer, but can make use of the skipping list while collecting documents. Moreover, it is a subclass of Scorer which can act as a non-top scorer. However, the performance is low now, because I have not implemented its .advance() in a efficent way.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Da, this looks neat!

Hmm, the patch didn't cleanly apply, but I was able to work through it. I think your dev area is not up to date with trunk?

Small code style things: can you try to add { .. } around the true/else body of if statements, even if they are only one line? And also no whitespace around the condition. E.g. instead of:

      if ( required.size() > 0 )
        return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, required, optional, prohibited, maxCoord);

do this:

      if (required.size() > 0) {
        return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, required, optional, prohibited, maxCoord);
      }

So it looks like BooleanNovelScorer is able to be a Scorer because the linked-list of visited buckets in one window are guaranteed to be in docID order, because we first visit the requiredConjunctionScorer's docs in that window.

Have you tested performance when the .advance method here isn't called? Ie, just boolean queries w/ one MUST and one or more SHOULD? I think the important question here is whether/in what cases the BooleanNovelScorer approach beats BooleanScorer2 performance?

I realized #5937 is related here, i.e. we should also sometimes use BooleanScorer for the minShouldMatch>1 case.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for your suggestions, Mike. And sorry for my late reply.

Hmm, the patch didn't cleanly apply, but I was able to work through it. I think your dev area is not up to date with trunk?

I haven't merged my branch to the newest trunk version, because my network account at school for April has been run out and I couldn't pull the code from github untill 1 May. Sorry for that.

Small code style things

I'm very sorry for the code style. That's my fault. Very sorry for that.

So it looks like BooleanNovelScorer is able to be a Scorer because the linked-list of visited buckets in one window are guaranteed to be in docID order, because we first visit the requiredConjunctionScorer's docs in that window.

Yes, you're right.

Have you tested performance when the .advance method here isn't called? Ie, just boolean queries w/ one MUST and one or more SHOULD?

No, I haven't. Do you mean the .advance method of subScorers in BooleanNovelScorer? If so, I will do that. If you mean the .advance method of BooleanNovelScorer itself, I think it would be confusing, because BooleanNovelScorer now is used when there's at least one MUST clause, no matter whether it acts as a top scorer or not. Therefore, .advance() of BooleanNovelScorer must be called when BooleanNovelScorer acts as a non-top scorer.

I think the important question here is whether/in what cases the BooleanNovelScorer approach beats BooleanScorer2 performance?

Yes, you're right. But BooleanNovelScorer has not been totally finished, and the performance itself remans to be improved especially its .advance method.

I realized #5937 is related here, i.e. we should also sometimes use BooleanScorer for the minShouldMatch>1 case.

Yes, I also notice that. :) I think this issue should be dealed with together.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I haven't merged my branch to the newest trunk version, because my network account at school for April has been run out and I couldn't pull the code from github untill 1 May. Sorry for that.

Good grief, that's awful you have to "budget" your allowed network access month by month ... don't worry about it, it's easy to apply the patch. Alternatively, just add a pointer here to your github fork and we can clone from there to review?

I'm very sorry for the code style. That's my fault. Very sorry for that.

No need to apologize; it's just code style ... everyone has their own, but we have a standard one in Lucene so we don't spend all our time fighting over whose style is best :)

If you mean the .advance method of BooleanNovelScorer itself, I think it would be confusing, because BooleanNovelScorer now is used when there's at least one MUST clause, no matter whether it acts as a top scorer or not. Therefore, .advance() of BooleanNovelScorer must be called when BooleanNovelScorer acts as a non-top scorer.

Yeah I meant BNS.advance, when it's a top scorer. Ie, can BNS beat BS2 in this case. It seems like you could test this case now since as a topScorer nobody would call the unfinished BNS.advance method. This way, if BNS can beat BS2 in this case we know it's worth pursuing.

I think this issue should be dealed with together.

+1, but only if time this summer permits (i.e. top priority is this issue, allowing BS to accept MUST clauses).

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

The patch is based on the github mirror commit c1e423e45e6fa9f846ab2c382c0100fd515b4cb1.

The following things are done in this patch:

Fix the bug on last patch. The bug is due to not setting prev and next to null before add an element to a linked list.
Refine the code style.
Make a small improvement on .advance(). The performance is a little better than the last patch, but still worse than the trunk, when testing on luceneutil.

P.S. The bug on last patch can not be detected by ant-test, but can be found by running query like "+a b" on luceneutil. I'm getting to add a junit test case which can detect the bug, but it may take me some days.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

luceneutil tasks file to test queries like "+a b c d e ..."

The performance shows as follows. || TaskQPS || baseline || StdDevQPS || my_modified_version || StdDev || Pct diff || | HighAndManyLowOr | 8.50 | (3.3%) | 1.72 | (0.3%) | -79.8% ( -80% - -78%) | | PKLookup | 239.75 | (0.9%) | 239.99 | (0.9%) | 0.1% ( -1% - 1%) | | LowAndManyHighOr | 7.11 | (1.4%) | 7.76 | (1.4%) | 9.1% ( 6% - 12%) | | LowAndManyLowOr | 33.83 | (0.7%) | 41.03 | (2.7%) | 21.3% ( 17% - 24%) | | HighAndManyHighOr | 0.12 | (0.7%) | 0.29 | (7.8%) | 148.0% ( 138% - 157%) |

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

A patch for luceneutil, which allows scores is different within a tolerance range.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Add TestBooleanNovelScorer.java to detect the bug on the second patch.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I like this tasks file!

But, maybe we could test on fewer terms, for the Low/HighAndManyLow/High tasks? I think it's more common to have a handful (3-5 maybe) of terms. But maybe keep your current category and rename it to Tons instead of Many?

Thank you for adding the test case; it's always disturbing when luceneutil finds a bug that "ant test" doesn't! Maybe we can improve the test so that it exercises BS and NBS? E.g., toggle the "require docs in order" via a custom collector? We could commit this test today to trunk/4x right?

A patch for luceneutil, which allows scores is different within a tolerance range.

Hmm do we know why the scores changed? Are we comparing BS2 to NovelBS? (I think BS and BS2 already have different scores today?).

So, with these changes, BS (a BulkScorer) can handle required clauses (but you commented this out in your patch in order to test NBS I guess?), and NBS (a Scorer) can handle required too.

Do you have any perf results of BS w/ required clauses (as a BulkScorer) vs BS2 (what trunk does today)?

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for your suggestions!

maybe we could test on fewer terms, for the Low/HighAndManyLow/High tasks? I think it's more common to have a handful (3-5 maybe) of terms.

When terms are few, BooleanNovelScorer performs slower than BS (about -10%). However, I have to generate tasks with fewer terms and rerun the tasks to reconfirm the specific perf. difference.

But maybe keep your current category and rename it to Tons instead of Many?

OK, I will do so.

Maybe we can improve the test so that it exercises BS and NBS? E.g., toggle the "require docs in order" via a custom collector?

Yes, I think that's a good idea.

Hmm do we know why the scores changed?

Yes, it's because the calculating orders are different. BS adds up scores of all SHOULD clauses, and then add their sum to the final score. BNS adds score of each SHOULD clause to final score one by one.

Are we comparing BS2 to NovelBS?

Yes.

I think BS and BS2 already have different scores today?

Yes. Actually, the score calculating order of BS is the same as BNS.

but you commented this out in your patch in order to test NBS I guess?

yes, I did that in order to test BNS. Otherwise, luceneutil would throw exception.

Do you have any perf results of BS w/ required clauses (as a BulkScorer) vs BS2 (what trunk does today)?

Hmm, I haven't carried out such experiment yet. Checking the perf. results of BS vs BS2 is a good idea. I will do that. :)

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

When terms are few, BooleanNovelScorer performs slower than BS (about -10%). However, I have to generate tasks with fewer terms and rerun the tasks to reconfirm the specific perf. difference.

OK, interesting. It may be BooleanNovelScorer doesn't work out, and we "just" focus on BooleanScorer handling MUST.

but you commented this out in your patch in order to test NBS I guess?

yes, I did that in order to test BNS. Otherwise, luceneutil would throw exception.

OK, maybe add a comment just saying something "temporarily commented out so NovelBS is invoked instead of BS"? And what exception did luceneutil throw...?

Hmm, I haven't carried out such experiment yet. Checking the perf. results of BS vs BS2

is a good idea. I will do that.

Thank you!

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for your reply.

OK, maybe add a comment just saying something "temporarily commented out so NovelBS is invoked instead of BS"?

I will comment that.

And what exception did luceneutil throw...?

It just says that "hit %s has wrong field/score value %s vs %s", and the perf. test abort. And the score value diff. is about 0.000001 .

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

It just says that "hit %s has wrong field/score value %s vs %s", and the perf. test abort. And the score value diff. is about 0.000001 .

Ahh, OK "just" the score diff between BS and BS2.

I'm nervous about the luceneutil change, just because I don't want to "encourage" complacency on scores being different in general.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

I'm nervous about the luceneutil change, just because I don't want to "encourage" complacency on scores being different in general.

I agree. but it seems that the original perf. tasks file has too few items on each case to discover scores' difference, when the scorer's calculating orders are different. Actually, if I decrease the items in my tasks file on each case to 3, the scores are the same with the trunk.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

About the scoring differences: I think it might be enough to fix the TODO in ConjunctionScorer.score() ?

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for your advice, Robert. Do you mean just changing

float sum = 0.0f;

to

double sum = 0.0f;

?

However, I'm not sure doing this will really be enough for scoring differences, as the differences are due to different calculating order.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think its enough in practice: its not expensive and it reduces the problem significantly. This is already the "hack" in use so that BooleanScorer and BooleanScorer2 have comparable scores: #5394

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Oh, thanks. I think it‘s worth a try.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

The patch is based on lucene github mirror commit cfb408ff6788e6fea8215098a785d72fb4e95c5b.

The following things have been done:

Rename TestBooleanNovelScorer to TestBooleanUnevenly, and this test suit test both BNS and BS when hit documents' distribution is unevenly.
Following Robert's advice, I sum scores into a double and cast to float in ConjunctionScorer. However, it seems to take little effect. Scores difference problem still remain.
Add a comment to scores difference within tolerance on luceneutil.
Make a new tasks file, which can test "AndSomeOR" cases.
Run luceneutil for "BNS vs BS2" and "BS vs BS2". The result is showed as follows.

P.S. BS has the same problem with score difference as BNS. Althrough there's no BS2 now as the architecture has changed, here I still call it BS2 for convenience.

BNS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff
        HighAndTonsLowOr       10.95      (3.5%)        1.52      (0.3%)  -86.1% ( -86% -  -85%)
        HighAndSomeLowOr       29.98      (6.7%)       11.84      (2.9%)  -60.5% ( -65% -  -54%)
         LowAndSomeLowOr      756.81      (1.4%)      503.21      (2.8%)  -33.5% ( -37% -  -29%)
        LowAndSomeHighOr       54.25      (2.1%)       53.26      (2.1%)   -1.8% (  -5% -    2%)
                PKLookup      241.74      (2.8%)      241.96      (2.3%)    0.1% (  -4% -    5%)
         LowAndTonsLowOr       40.23      (1.2%)       43.19      (7.2%)    7.4% (   0% -   15%)
        LowAndTonsHighOr        2.63      (2.1%)        2.99      (2.3%)   13.8% (   9% -   18%)
       HighAndSomeHighOr        4.99      (1.8%)        5.86      (4.7%)   17.4% (  10% -   24%)
       HighAndTonsHighOr        0.09      (1.5%)        0.22      (8.1%)  145.4% ( 133% -  157%)

BS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff
        HighAndTonsLowOr       16.54      (2.4%)        3.70      (0.2%)  -77.6% ( -78% -  -76%)
        HighAndSomeLowOr       11.95      (8.5%)        4.29      (0.8%)  -64.1% ( -67% -  -59%)
         LowAndSomeLowOr      839.11      (1.9%)      540.83      (2.5%)  -35.5% ( -39% -  -31%)
        LowAndSomeHighOr      149.50      (2.6%)      136.71      (3.4%)   -8.6% ( -14% -   -2%)
       HighAndSomeHighOr        3.72      (1.7%)        3.51      (1.7%)   -5.6% (  -8% -   -2%)
                PKLookup      240.32      (2.8%)      238.87      (2.8%)   -0.6% (  -6% -    5%)
        LowAndTonsHighOr        4.96      (2.3%)        5.35      (3.8%)    7.8% (   1% -   14%)
         LowAndTonsLowOr       35.28      (1.2%)       39.00      (5.2%)   10.6% (   4% -   17%)
       HighAndTonsHighOr        0.16      (1.1%)        0.36      (4.0%)  122.6% ( 116% -  129%)

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

A patch based on lucene github mirror commit cf10341825ff6bd1662dd48c51926bc51d751ce5.

I use a bitset to skip required docs when scaning optional and prohibited docs. The perf. comparison is at the bottom.

Besides, I build a new tasks file the test the perf. and I discover that BNS optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs.

BNS (without bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        4.29      (2.9%)        1.08      (0.6%)  -74.8% ( -76% -  -73%)
        HighAndTonsLowOr        4.87      (6.4%)        1.24      (1.0%)  -74.4% ( -76% -  -71%)
       HighAndSomeLowNot        9.03      (5.2%)        4.11      (4.1%)  -54.4% ( -60% -  -47%)
        HighAndSomeLowOr       16.21      (9.6%)        7.75      (4.1%)  -52.2% ( -60% -  -42%)
         LowAndSomeLowOr      303.28      (2.4%)      183.14      (6.6%)  -39.6% ( -47% -  -31%)
        LowAndSomeLowNot      257.24      (1.8%)      157.07      (6.5%)  -38.9% ( -46% -  -31%)
        LowAndSomeHighOr       36.78      (1.9%)       33.74      (3.0%)   -8.3% ( -12% -   -3%)
        LowAndTonsLowNot       21.28      (2.0%)       19.69      (6.9%)   -7.5% ( -16% -    1%)
       LowAndSomeHighNot       34.40      (1.6%)       33.69      (3.2%)   -2.1% (  -6% -    2%)
                PKLookup      100.63      (4.8%)      103.46      (4.7%)    2.8% (  -6% -   12%)
        LowAndTonsHighOr        1.26      (1.6%)        1.41      (1.7%)   11.8% (   8% -   15%)
         LowAndTonsLowOr       13.66      (0.9%)       15.50      (6.0%)   13.5% (   6% -   20%)
      HighAndSomeHighNot        2.65      (1.4%)        3.12      (6.5%)   17.6% (   9% -   25%)
       HighAndSomeHighOr        2.21      (2.4%)        2.62      (5.8%)   18.6% (  10% -   27%)
       HighAndTonsHighOr        0.07      (0.8%)        0.19     (10.5%)  160.3% ( 147% -  172%)
       LowAndTonsHighNot        2.86      (1.6%)       10.24     (18.1%)  257.7% ( 234% -  281%)
      HighAndTonsHighNot        0.05      (0.8%)        0.40     (28.2%)  641.8% ( 607% -  676%)

BS vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
        HighAndTonsLowOr        4.02      (6.8%)        0.87      (0.5%)  -78.2% ( -80% -  -76%)
       HighAndTonsLowNot        4.95      (3.4%)        1.29      (0.9%)  -73.9% ( -75% -  -72%)
        HighAndSomeLowOr       14.45      (9.5%)        6.68      (3.7%)  -53.8% ( -61% -  -44%)
       HighAndSomeLowNot       14.78      (5.1%)        7.48      (3.9%)  -49.4% ( -55% -  -42%)
         LowAndSomeLowOr      316.55      (2.2%)      170.14      (5.6%)  -46.3% ( -52% -  -39%)
        LowAndSomeLowNot      283.47      (1.7%)      157.35      (6.0%)  -44.5% ( -51% -  -37%)
        LowAndSomeHighOr       39.39      (2.0%)       35.07      (3.1%)  -11.0% ( -15% -   -6%)
       LowAndSomeHighNot       53.96      (2.0%)       48.57      (3.8%)  -10.0% ( -15% -   -4%)
        LowAndTonsLowNot       17.97      (1.5%)       17.04      (6.0%)   -5.2% ( -12% -    2%)
                PKLookup       97.57      (2.7%)      100.21      (5.2%)    2.7% (  -5% -   10%)
        LowAndTonsHighOr        3.59      (1.7%)        3.74      (2.4%)    4.1% (   0% -    8%)
         LowAndTonsLowOr       14.71      (1.3%)       15.63      (5.7%)    6.3% (   0% -   13%)
      HighAndSomeHighNot        1.84      (1.3%)        2.05      (5.6%)   11.2% (   4% -   18%)
       HighAndSomeHighOr        1.93      (2.1%)        2.16      (5.6%)   11.9% (   4% -   20%)
       HighAndTonsHighOr        0.05      (1.0%)        0.13     (14.1%)  144.8% ( 128% -  161%)
       LowAndTonsHighNot        1.63      (1.9%)        4.95      (7.2%)  204.0% ( 191% -  217%)
      HighAndTonsHighNot        0.06      (1.0%)        0.34     (18.2%)  459.6% ( 435% -  483%)

BNS (with bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
        HighAndSomeLowOr        7.45     (12.0%)        3.49      (6.6%)  -53.1% ( -64% -  -39%)
       HighAndSomeLowNot       10.45      (8.0%)        5.25      (6.8%)  -49.7% ( -59% -  -37%)
         LowAndSomeLowOr      310.53      (2.3%)      168.56      (5.8%)  -45.7% ( -52% -  -38%)
        LowAndSomeLowNot      292.05      (2.3%)      165.88      (5.7%)  -43.2% ( -50% -  -36%)
       HighAndTonsLowNot        5.94      (3.5%)        4.33      (6.8%)  -27.0% ( -36% -  -17%)
        HighAndTonsLowOr        5.92      (4.4%)        4.39      (6.0%)  -25.9% ( -34% -  -16%)
       LowAndSomeHighNot       53.79      (2.4%)       47.71      (2.8%)  -11.3% ( -16% -   -6%)
        LowAndSomeHighOr       31.03      (2.6%)       28.20      (2.4%)   -9.1% ( -13% -   -4%)
         LowAndTonsLowOr       18.58      (1.1%)       17.60      (6.2%)   -5.3% ( -12% -    2%)
      HighAndSomeHighNot        1.49      (1.8%)        1.44      (8.9%)   -3.5% ( -13% -    7%)
                PKLookup       96.96      (3.4%)      100.03      (5.1%)    3.2% (  -5% -   12%)
        LowAndTonsHighOr        2.06      (2.2%)        2.18      (2.3%)    5.9% (   1% -   10%)
        LowAndTonsLowNot       13.63      (1.3%)       14.57      (6.3%)    6.9% (   0% -   14%)
       HighAndSomeHighOr        2.03      (2.4%)        2.33      (8.1%)   14.5% (   3% -   25%)
       HighAndTonsHighOr        0.07      (0.8%)        0.17     (13.6%)  158.2% ( 142% -  174%)
       LowAndTonsHighNot        1.40      (2.2%)        6.21     (11.3%)  344.2% ( 323% -  365%)
      HighAndTonsHighNot        0.07      (1.1%)        0.46     (24.2%)  572.1% ( 540% -  604%)

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Da!

When you say "BNS (without bitset) vs. BS2" that means baseline=BS2 and my_version=BNS (without bitset)? I just want to make sure I have the direction right!

With the added bitset, couldn't you not use a linked list anymore? Ie, just use prev/nextSetBit. I wonder if the bitset (instead of the linked list) could also help BooleanScorer? Maybe test this change separately (e.g. just modify BS we have today on trunk) to see if it helps or hurts... if it does help, it seems like BNS could be used (or BS could be a Scorer not a BulkScorer) even when there are no MUST clauses? Ie, the bitset lets us easily keep the order. Then we can merge BS/BNS into one?

Could you attach all new tasks as a single file in general? Note that when you set up a luceneutil test, you can add a task filter using addTaskPattern, so you run just a subset of the tasks for that one test.

Strange that the scores are still different between BS/BS2 and BNS/BS2 when using double.

If there's only 1 required clause sent to BS/BNS can't we use its scorer instead?

Have you explored having BS interact directly with all the MUST clauses, rather than using ConjunctionScorer?

Because we have wildly divergent results (sometimes one is much faster, other times it's much slower) we will somehow need to add logic to pick the right scorer for each query. But we can defer this until we're "doneish" iterating the changes to each scorer... it can come later on.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for your suggestions, Mike!

When you say "BNS (without bitset) vs. BS2" that means baseline=BS2 and my_version=BNS (without bitset)?

Yes, this is just what I mean.

With the added bitset, couldn't you not use a linked list anymore? Ie, just use prev/nextSetBit. I wonder if the bitset (instead of the linked list) could also help BooleanScorer? Maybe test this change separately (e.g. just modify BS we have today on trunk) to see if it helps or hurts... if it does help, it seems like BNS could be used (or BS could be a Scorer not a BulkScorer) even when there are no MUST clauses? Ie, the bitset lets us easily keep the order. Then we can merge BS/BNS into one?

Oh, that's a good idea! I will try that. However, linked list can be helpful when required docs is extremly sparse.

Could you attach all new tasks as a single file in general? Note that when you set up a luceneutil test, you can add a task filter using addTaskPattern, so you run just a subset of the tasks for that one test.

Do you mean merging And.tasks and AndOr.tasks ? If so, there's no need to do that, because And.tasks contains all tasks in AndOr.tasks, although tasks' names are changed. All the way, thanks for the advice on using addTaskPattern. I haven't noticed that.

Strange that the scores are still different between BS/BS2 and BNS/BS2 when using double.

I don't think it strange. Because the difference is due to the score calculating order. Supposed that a doc hits "+a b c", SCORE_BS = (float)((float)(double)score_a + (float)score_b) + (float)score_c, while SCORE_BS2 = (float)(double)score_a + ((float)score_b + (float)score_c). Here, (float) means that we can only get the score by .score() whose return type is float. The modification on this patch can only make score_a has a temp double value.

If there's only 1 required clause sent to BS/BNS can't we use its scorer instead? Have you explored having BS interact directly with all the MUST clauses, rather than using ConjunctionScorer?

Hmm. I don't think that would be helpful. The reason is just the same as above.

Because we have wildly divergent results (sometimes one is much faster, other times it's much slower) we will somehow need to add logic to pick the right scorer for each query. But we can defer this until we're "doneish" iterating the changes to each scorer... it can come later on.

Yes, I agree.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

About scores diff. on BS/BS2 (the same as BNS/BS2)

Now, there's scores diff. on BS/BS2, when excuting query like "+a b c d ...".

I have been told that the reason is indicate by the TODO on ReqOptSumScorer.score() which says that

// TODO: sum into a double and cast to float if we ever send required clauses to BS1

However, I don't think so, as the score bias is due to different score calculating orders.

Supposed that a doc hits the query "+a b c d", the score calculated by BS is

BS.score(doc) = ((a.score() + b.score()) + c.score()) + d.score()

while the score calculated by BS2 is

BS2.score(doc) = a.score() + (float)(b.score() + c.score() + d.score())

Notice that, in BS2, we can only get the float value of (b.score() + c.score() + d.score()) by reqScorer.score().

Furthermore, I have noticed that actually we can control the BS's score calulating order, so that

BS.score(doc) = a.score() + ((b.score() + c.score()) + d.score())

However, for BS2, we do not know the calculating order of (b.score() + c.score() + d.score()), as the order is determined by scorer's position in a heap. I still think this matters little.

I will rearrange the calculating order of BS.score() at next patch, to see whether it works.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks for the detailed explanation on the score differences Da. I think you are right that order-of-operations (with float casts in between) explains the score differences.

Do you mean merging And.tasks and AndOr.tasks ? If so, there's no need to do that, because And.tasks contains all tasks in AndOr.tasks, although tasks' names are changed.

Ahh, OK nevermind.

However, linked list can be helpful when required docs is extremly sparse.

True, but maybe in such cases (low freqs for the clauses) we should just use BS2. I think BS/BNS do better for high-freq clauses?

If there's only 1 required clause sent to BS/BNS can't we use its scorer instead? Have you explored having BS interact directly with all the MUST clauses, rather than using ConjunctionScorer?

Hmm. I don't think that would be helpful. The reason is just the same as above.

I think we may get better performance when the MUST clauses are high freq, if we just use BooleanScorer to enumerate all the matching docs for each MUST instead of going through ConjunctionScorer?

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

True, but maybe in such cases (low freqs for the clauses) we should just use BS2. I think BS/BNS do better for high-freq clauses?

I'm sorry that I could not be sure whether it's ture now, as I haven't made a closer analysis on the perf results. The perf of BS/BNS depends on many factors, such as freq of each clause and the number of SHOULD(and MUST_NOT) clauses.

I think we may get better performance when the MUST clauses are high freq, if we just use BooleanScorer to enumerate all the matching docs for each MUST instead of going through ConjunctionScorer?

I afraid that enumerating all the matching docs would not get better perf. In fact, BS2 and ConjunctionScorer collect docs by the method called "document-at-a-time"(DAAT), while BS/BNS is something like a combination of DAAT and "term-at-a-time"(TAAT). For conjunctive clauses, it's more efficient to use DAAT than TAAT, as DAAT scans fewer docs than TAAT.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

In theory, yes, DAAT that BS2 does will visit fewer docs than BS/BNS, but in practice I think the cost of the PQ and .advance when the number of matching docs is high, might be higher.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Hmm. I mean ConjunctionScorer does not use PQ, and it can be faster to use it rather than enumerating all the matching docs for each MUST. As for .advance, I'm not sure whether its cost can exceed .next much enough, so that using .advance will be slower than using .next in this case.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm. I mean ConjunctionScorer does not use PQ

Ahhhh right. Only Disjunctions.... OK.

As for .advance, I'm not sure whether its cost can exceed .next much enough, so that using .advance will be slower than using .next in this case.

OK, let's not explore this ...

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

This is a patch based on git mirror commit 8f9b823db1d6fba2cc7ec61b0596970f3c8bbe85. The following things are done in this patch.

Solve the problem of score diff. between pure DAAT(ie. BS2, as BS2 does not exist now, I think it may be better to call it pure DAAT) and BS completely.
Add a new Scorer named BooleanScorerInOrder which uses only bitset but not linked list to collect docs. I create this new Scorer but not change the old BS, because I think BS may be more useful in some cases. For now, BSIO does not support the cases where there is no any MUST clause, because the procedure for such cases is totally different from cases with MUST clause.

The perf. of BSIO is as follows.

                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
         LowAndSomeLowOr      259.82      (2.3%)      102.70      (2.8%)  -60.5% ( -64% -  -56%)
        LowAndSomeLowNot      184.38      (2.8%)       80.26      (2.3%)  -56.5% ( -59% -  -52%)
       HighAndSomeLowNot       10.44      (7.2%)        4.70      (4.3%)  -55.0% ( -61% -  -46%)
        HighAndSomeLowOr       18.11      (8.0%)        8.83      (4.0%)  -51.2% ( -58% -  -42%)
       HighAndTonsLowNot        3.03      (5.4%)        1.62      (4.7%)  -46.8% ( -53% -  -38%)
        LowAndTonsLowNot       14.59      (1.2%)        8.86      (2.0%)  -39.3% ( -41% -  -36%)
         LowAndTonsLowOr       14.11      (1.1%)        8.74      (3.0%)  -38.1% ( -41% -  -34%)
        HighAndTonsLowOr        5.52      (4.3%)        3.85      (5.2%)  -30.2% ( -38% -  -21%)
        LowAndSomeHighOr       24.97      (3.5%)       21.10      (3.2%)  -15.5% ( -21% -   -9%)
       LowAndSomeHighNot       25.51      (3.3%)       23.22      (3.4%)   -9.0% ( -15% -   -2%)
        LowAndTonsHighOr        1.66      (2.6%)        1.64      (2.8%)   -1.1% (  -6% -    4%) 
                PKLookup       95.22      (5.5%)       96.64      (6.1%)    1.5% (  -9% -   13%)
      HighAndSomeHighNot        2.37      (2.0%)        2.55      (6.9%)    7.4% (  -1% -   16%)
       HighAndSomeHighOr        2.25      (2.7%)        2.43      (6.0%)    7.8% (   0% -   16%)
       LowAndTonsHighNot        2.72      (2.3%)        5.94      (5.8%)  118.4% ( 107% -  129%)
       HighAndTonsHighOr        0.05      (0.8%)        0.12     (17.0%)  162.4% ( 143% -  181%)
      HighAndTonsHighNot        0.08      (1.3%)        0.48     (23.4%)  507.0% ( 476% -  538%)

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Solve the problem of score diff. between pure DAAT(ie. BS2, as BS2 does not exist now, I think it may be better to call it pure DAAT) and BS completely.

That's great news! Looks like you separated required & optional scores in the non-DAAT impls and then carefully cast to float at the right times? Also, you can remove that TODO in ConjunctionScorer on switching sum to double?

Add a new Scorer named BooleanScorerInOrder which uses only bitset but not linked list to collect docs.

So BooleanScorerIO is just like BooleanNovelScorer, except it uses a bitset instead of linked list to track the set buckets? Between BNS and BSIO which one is faster?

Why does BSIO/NS see massive gains on the tasks that have so many NOT clauses? I think in trunk/4.x today, we are not scoring the NOT clauses, right? While these gains are sizable, I think it's not a common use case...

I think you've explored a number of options here and now we need to see if we can make this committable, e.g. figure out how to have BooleanQuery pick the right scorer for the situation? Somehow we need logic that looks at how many / cost of the sub-clauses and picks the right scorer?

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Looks like you separated required & optional scores in the non-DAAT impls and then carefully cast to float at the right times?

Yes, you get what I mean.

you can remove that TODO in ConjunctionScorer on switching sum to double?

OK, I will do that on next patch.

So BooleanScorerIO is just like BooleanNovelScorer, except it uses a bitset instead of linked list to track the set buckets? Between BNS and BSIO which one is faster?

Yes. exactly. According to perf. tests, it seems that BNS do better for those tasks faster than the trunk, while do better for those tasks slower than the trunk.

Why does BSIO/NS see massive gains on the tasks that have so many NOT clauses? I think in trunk/4.x today, we are not scoring the NOT clauses, right? While these gains are sizable, I think it's not a common use case...

The reason is that when we search for "+a -b -c -d", lucene actually do "+a -(b c d)" and the cost of getting disjunction of (b c d) is huge. Indeed, such case may not be a common case.

I think you've explored a number of options here and now we need to see if we can make this committable, e.g. figure out how to have BooleanQuery pick the right scorer for the situation? Somehow we need logic that looks at how many / cost of the sub-clauses and picks the right scorer?

Yeah, you're right.

Besides, a new idea has come up to me. For BNS, we actually does not make use of the hash feature of BucketTable. Thus, I think we should not take BucketTable as a hash table (ie. do not place doc to the absolute place buckets[doc & MASK]). Firstly, we get 2K required docs to BucketTable. Then, we do TAAT on these 2K docs.

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

This is a patch based on the git mirror commit 7f66461aea7bc2cb6f31a993cba77734e5e0f9d9.

In this patch, I take the bucketTable as an array but not a hash table. It seems that its perf. is better than former patches' on most cases.

As you know, after putting required docs into bucketTable, I have to scan both the table and optional docs. Here, I have tried skipping to scan the bucketTable to improve the perf. The results is as follows.

No skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        6.56      (3.1%)        2.59      (1.0%)  -60.5% ( -62% -  -58%)
        HighAndTonsLowOr        6.43      (3.3%)        2.58      (0.8%)  -59.9% ( -61% -  -57%)
        HighAndSomeLowOr        8.49      (8.5%)        4.05      (1.8%)  -52.3% ( -57% -  -45%)
       HighAndSomeLowNot        6.17      (8.6%)        3.16      (2.1%)  -48.8% ( -54% -  -41%)
         LowAndSomeLowOr      250.58      (2.0%)      194.86      (1.6%)  -22.2% ( -25% -  -18%)
        LowAndSomeLowNot      178.66      (1.6%)      147.67      (2.2%)  -17.3% ( -20% -  -13%)
        LowAndSomeHighOr       40.71      (2.8%)       41.50      (1.8%)    2.0% (  -2% -    6%)
                PKLookup       97.59      (3.0%)       99.52      (4.6%)    2.0% (  -5% -    9%)
       LowAndSomeHighNot       20.76      (3.0%)       21.54      (2.3%)    3.7% (  -1% -    9%)
      HighAndSomeHighNot        2.22      (1.7%)        2.67      (4.4%)   20.3% (  13% -   26%)
       LowAndTonsHighNot        3.81      (2.3%)        4.60      (2.1%)   20.8% (  15% -   25%)
        LowAndTonsHighOr        2.87      (2.3%)        3.48      (2.6%)   21.2% (  15% -   26%)
       HighAndSomeHighOr        1.74      (2.1%)        2.16      (3.5%)   24.0% (  18% -   30%)
         LowAndTonsLowOr       18.66      (1.3%)       23.68      (1.9%)   26.9% (  23% -   30%)
        LowAndTonsLowNot       16.01      (1.4%)       22.16      (2.8%)   38.4% (  33% -   43%)
       HighAndTonsHighOr        0.04      (0.9%)        0.11      (9.8%)  158.2% ( 146% -  170%)
      HighAndTonsHighNot        0.06      (1.1%)        0.15     (13.5%)  166.2% ( 149% -  182%)

---------------------------------------------------
Binary search skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        6.22      (3.8%)        2.45      (0.9%)  -60.6% ( -62% -  -58%)
        HighAndSomeLowOr        8.29     (11.2%)        4.40      (3.0%)  -46.9% ( -54% -  -36%)
       HighAndSomeLowNot       12.34      (7.1%)        6.65      (2.6%)  -46.1% ( -52% -  -39%)
         LowAndSomeLowOr      232.38      (2.9%)      165.05      (1.8%)  -29.0% ( -32% -  -24%)
        HighAndTonsLowOr        5.17      (6.2%)        3.75      (3.0%)  -27.4% ( -34% -  -19%)
        LowAndSomeLowNot      227.71      (2.6%)      171.13      (3.2%)  -24.8% ( -29% -  -19%)
       HighAndSomeHighOr        1.35      (3.9%)        1.14      (3.5%)  -16.1% ( -22% -   -9%)
        LowAndSomeHighOr       50.17      (3.6%)       48.84      (3.7%)   -2.7% (  -9% -    4%)
       LowAndSomeHighNot       52.71      (3.0%)       51.55      (3.8%)   -2.2% (  -8% -    4%)
                PKLookup       90.17      (3.5%)       91.38      (3.3%)    1.3% (  -5% -    8%)
      HighAndSomeHighNot        1.69      (2.9%)        2.00      (6.3%)   18.5% (   8% -   28%)
         LowAndTonsLowOr       15.61      (1.9%)       18.59      (2.8%)   19.0% (  14% -   24%)
        LowAndTonsHighOr        1.82      (2.7%)        2.20      (4.6%)   20.7% (  13% -   28%)
        LowAndTonsLowNot       15.51      (1.7%)       20.14      (3.8%)   29.8% (  23% -   35%)
       LowAndTonsHighNot        1.01      (2.9%)        1.34      (6.5%)   31.7% (  21% -   42%)
       HighAndTonsHighOr        0.07      (0.9%)        0.12      (6.9%)   77.7% (  69% -   86%)
      HighAndTonsHighNot        0.07      (1.4%)        0.19     (11.9%)  162.4% ( 146% -  178%)

---------------------------------------------------
8 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        5.45      (3.3%)        1.69      (1.3%)  -69.0% ( -71% -  -66%)
        HighAndSomeLowOr        5.46     (11.0%)        2.76      (4.4%)  -49.5% ( -58% -  -38%)
       HighAndSomeLowNot       17.94      (5.7%)       10.40      (3.8%)  -42.1% ( -48% -  -34%)
         LowAndSomeLowOr      306.62      (1.7%)      231.45      (1.5%)  -24.5% ( -27% -  -21%)
        LowAndSomeLowNot      286.30      (1.7%)      218.13      (2.0%)  -23.8% ( -27% -  -20%)
        HighAndTonsLowOr        6.34      (3.5%)        5.31      (4.5%)  -16.3% ( -23% -   -8%)
        LowAndSomeHighOr       33.53      (2.1%)       33.85      (2.2%)    1.0% (  -3% -    5%)
                PKLookup       97.39      (1.9%)       98.40      (3.9%)    1.0% (  -4% -    6%)
       LowAndSomeHighNot       42.16      (2.0%)       42.73      (2.1%)    1.4% (  -2% -    5%)
       HighAndSomeHighOr        2.43      (2.4%)        2.76      (4.8%)   13.4% (   6% -   21%)
      HighAndSomeHighNot        2.74      (1.4%)        3.17      (4.6%)   15.7% (   9% -   21%)
        LowAndTonsHighOr        3.45      (1.8%)        4.21      (3.2%)   22.0% (  16% -   27%)
       LowAndTonsHighNot        2.37      (1.8%)        2.95      (3.0%)   24.6% (  19% -   30%)
         LowAndTonsLowOr       17.21      (1.1%)       22.50      (2.6%)   30.7% (  26% -   34%)
        LowAndTonsLowNot       13.60      (1.4%)       19.97      (2.4%)   46.8% (  42% -   51%)
       HighAndTonsHighOr        0.08      (0.5%)        0.19      (9.9%)  140.3% ( 129% -  151%)
      HighAndTonsHighNot        0.06      (1.7%)        0.15     (12.0%)  163.9% ( 147% -  180%)

---------------------------------------------------
16 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        6.69      (2.0%)        2.71      (0.8%)  -59.5% ( -61% -  -57%)
        HighAndTonsLowOr        1.69     (10.1%)        0.89      (2.1%)  -47.1% ( -53% -  -38%)
        HighAndSomeLowOr        7.28     (11.5%)        3.96      (1.9%)  -45.6% ( -52% -  -36%)
       HighAndSomeLowNot       14.38      (5.2%)        8.09      (1.5%)  -43.7% ( -47% -  -39%)
         LowAndSomeLowOr      295.60      (2.3%)      223.80      (2.0%)  -24.3% ( -27% -  -20%)
        LowAndSomeLowNot      171.52      (1.7%)      140.82      (1.5%)  -17.9% ( -20% -  -14%)
        LowAndSomeHighOr       40.12      (2.1%)       41.32      (3.2%)    3.0% (  -2% -    8%)
                PKLookup       96.15      (2.4%)       99.15      (6.0%)    3.1% (  -5% -   11%)
       LowAndSomeHighNot       31.53      (2.3%)       32.64      (2.9%)    3.5% (  -1% -    8%)
      HighAndSomeHighNot        2.67      (1.3%)        3.04      (3.4%)   13.9% (   9% -   18%)
       HighAndSomeHighOr        2.11      (2.1%)        2.58      (3.3%)   22.5% (  16% -   28%)
        LowAndTonsHighOr        2.17      (1.8%)        2.67      (3.2%)   23.1% (  17% -   28%)
       LowAndTonsHighNot        2.53      (1.6%)        3.16      (3.3%)   25.2% (  20% -   30%)
        LowAndTonsLowNot       14.68      (0.9%)       20.97      (3.6%)   42.8% (  38% -   47%)
         LowAndTonsLowOr       14.04      (1.1%)       20.09      (2.6%)   43.0% (  38% -   47%)
       HighAndTonsHighOr        0.06      (0.7%)        0.15      (9.4%)  152.0% ( 141% -  163%)
      HighAndTonsHighNot        0.05      (0.8%)        0.14     (12.1%)  167.3% ( 153% -  181%)

---------------------------------------------------
32 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev                Pct diff
       HighAndTonsLowNot        6.50      (2.6%)        3.24      (1.1%)  -50.1% ( -52% -  -47%)
       HighAndSomeLowNot        9.51      (6.4%)        4.87      (3.2%)  -48.8% ( -54% -  -41%)
        HighAndSomeLowOr       14.87     (11.6%)        8.81      (3.6%)  -40.8% ( -50% -  -28%)
         LowAndSomeLowOr      311.27      (2.6%)      241.43      (1.6%)  -22.4% ( -25% -  -18%)
        LowAndSomeLowNot      231.96      (2.4%)      181.95      (2.0%)  -21.6% ( -25% -  -17%)
        HighAndTonsLowOr        5.60      (5.7%)        4.45      (3.7%)  -20.5% ( -28% -  -11%)
       LowAndSomeHighNot       62.10      (2.6%)       60.59      (2.5%)   -2.4% (  -7% -    2%)
        LowAndSomeHighOr       49.36      (3.0%)       48.87      (3.1%)   -1.0% (  -6% -    5%)
                PKLookup       96.38      (2.0%)       95.91      (2.5%)   -0.5% (  -4% -    4%)
      HighAndSomeHighNot        2.08      (1.6%)        2.34      (5.2%)   12.7% (   5% -   19%)
       HighAndSomeHighOr        2.30      (2.6%)        2.63      (5.7%)   14.2% (   5% -   23%)
        LowAndTonsHighOr        1.88      (2.5%)        2.35      (4.2%)   25.5% (  18% -   33%)
       LowAndTonsHighNot        1.10      (2.5%)        1.45      (5.0%)   31.1% (  23% -   39%)
         LowAndTonsLowOr       14.38      (1.0%)       20.24      (3.2%)   40.8% (  36% -   45%)
        LowAndTonsLowNot       12.98      (1.0%)       18.82      (2.9%)   45.0% (  40% -   49%)
       HighAndTonsHighOr        0.08      (0.8%)        0.18     (12.3%)  138.0% ( 123% -  152%)
      HighAndTonsHighNot        0.08      (1.1%)        0.21     (12.5%)  157.6% ( 142% -  172%)

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

This is neat! So now it's just a simple array holding all buckets seen, which we can only do if there's at least one MUST clause.

Maybe try testing different values of SIZE?

When you fold in the MUST_NOT clauses you could just compact the array as you go, instead of having a separate valid bool?

I think we should start moving this towards something committable? I.e., of all the approaches you've explored, let's take the most promising and fold them into a new scorer, and then work on the logic/heuristics for when this scorer should and shouldn't apply?

asfimport commented 10 years ago

Da Huang (migrated from JIRA)

Thanks for you suggestions, mike.

Maybe try testing different values of SIZE?

Hmm, that's a good idea.

When you fold in the MUST_NOT clauses you could just compact the array as you go, instead of having a separate valid bool?

Oh, that's is a great idae! I will do that on next patch.

I think we should start moving this towards something committable? I.e., of all the approaches you've explored, let's take the most promising and fold them into a new scorer, and then work on the logic/heuristics for when this scorer should and shouldn't apply?

Yeah, I agree. I am working on that.

Next