apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

pass liveDocs Bits down in scorercontext, instead of Weights pulling from the reader [LUCENE-3474] #4548

Closed asfimport closed 13 years ago

asfimport commented 13 years ago

Spinoff from #2610, this would allow filters to work in a more flexible way (besides just cleaning up)


Migrated from LUCENE-3474 by Robert Muir (@rmuir), resolved Oct 01 2011 Attachments: LUCENE-3474.patch (versions: 4)

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch with javadocs for the acceptDocs, and i had neglected the MatchAllDocsScorer.

all tests pass

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks great! Nice to separate this from #2610; after this patch, #2610 will be tiny!

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

Super, lets go ahead and commit this.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I kind of liked the ScorerContext#topScorer(boolean) builder like pattern which is way less verbose than using ctors. Can we add those methods back and force a copy on setAcceptsDocs(Bits) that way we don't need to copy all settings on an incoming context.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I really hated that thing: Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it.

Not good!

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Its strange and unintuitive: half the code to the class is explaining the "pattern" that the class uses and how to use it.

very straight forward IMO. Well since it used to be that way we should keep it. Just changing stuff like this for "taste" reasons is not acceptable for me though. We already figured out that we have the "builder" vs. "no-builder" camp have here in Lucene and I don't want to fight this every time something like that comes up.

I really hated that thing

thats fine I hate the ctor verbosity so we are deadlocking here. Again :(

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Just changing stuff like this for "taste" reasons is not acceptable for me though.

Really? we can't write patches that change code in trunk any more?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

By the way, I didnt change this for taste reasons.

I want the Bitset to be required to build a ScorerContext (Note both ctors take it!)

So this pretty much blew the existing "pattern" out of the water.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I want the Bitset to be required to build a ScorerContext (Note both ctors take it!)

rename def() to create(Bits), done.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

So this pretty much blew the existing "pattern" out of the water.

boolean boolean params are just another start of a big mess here. the named builder like methods here make it very explicit what you are doing here. If we gonna add more which is likely we gonna end up with more boolean params people need to get in the right order. The chance to introduce an error here is way less than with a ctor.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Simon i have never seen code with this pattern before.

The chance to introduce error with that crazy-builder-like-thing is tremendous, because its unnatural. (In fact i think i spotted some things doing this patch, for other issues)

What is wrong with normal java objects?

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I don't wanna fight this again, its too demanding for me. Go for it I don't think my opinion counts here obviously. I don't want to be in your way really. Sorry for raising an objection on the patch.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Go for it I don't think my opinion counts here obviously

Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions.

its not like anyone has committed any shit here, so quit overreacting.

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Mr. Muir: I also disagree with you. If this gets committed I will revert it - just as you did with Yonik in the past - if it blows me out of Lucene PMC/Lucene Committers, who cares. Sorry, this pattern is very simple and often used.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Thats fine Uwe, we can hold search performance hostage over this broken builder pattern

This is gonna be great.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Sorry, this pattern is very simple and often used.

Where? For example where in the JDK uses this .def() etc?

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

Does Simon's suggestion of replacing def() with create(Bits) solve the mandatory Bits issue? Can we just fix the broken parts of the builder instead of outright replacing it?

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Why? because I disagree with you? Thats pretty natural man, normal for people to disagree on opinions.

because I don't see that you are going away from your opinion whatever I say. You made clear you don't want to have any builder pattern in lucene so what is the point of discussing then. I need to put up my own patch which uses the builder to make my objections being in the patch obviously which is not what I am used to. Usually we try to find a compromise and by iterating, right? Each time this pattern comes up there is no way that you move a tiny bit from your opinion just because you don't like it. Well I don't like things people suggest from a code syte perspective but it makes sense very often, so I change it. I don't see why this needs to go for rounds and rounds of fighting here. We did this for a reason when ScorerContext has introduced, it served as a little DSL on top of it enforcing immutability. if you want to have Bit mandatory you should just do something like ScorereContext.create(delDocs).topLevel(true).outOfOrder(false) which makes very clear what you want rather than new ScoreContext(delDocs, true, false) and we gonna have more boolean params here in the future.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Where? For example where in the JDK uses this .def() etc?

those are similar patterns:

http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ProcessBuilder.html http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html http://download.oracle.com/javase/6/docs/api/javax/sql/DataSource.html

we use def() as a shortcut for default() (keyword though) we should rather use create(Bits) IMP

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

because I don't see that you are going away from your opinion whatever I say.

Thats right, I think i'm allowed to have my own opinion?

All I did was upload a patch: I'm in no rush to commit here, we can just leave the issue open until everyone is happy.

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

we can hold search performance hostage over this broken builder pattern

You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search? Hey, program your stuff in C/C++ and use CLucene in future.

Ah by the way, recode all toString() methods anywhere in Lucene and rip StringBuilder!

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

You really think that additional method calls and Eden objects on highest level (which are called when creating scorers, not when scorers are consumed) will slowdown your search?

No, I don't.

I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I mean that we can hold our filter execution performance hostage over this internal API, since you have clearly voiced you will revert my commit if i commit the patch.

Are you saying you wouldn't support an updated version of your patch that went back to builder style?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

those are similar patterns:

http://download\.oracle\.com/javase/1\.5\.0/docs/api/java/lang/ProcessBuilder\.html http://download\.oracle\.com/javase/6/docs/api/java/sql/PreparedStatement\.html http://download\.oracle\.com/javase/6/docs/api/javax/sql/DataSource\.html

Actually none of those are similar at all.

None of those create a new object on each setter, they are normal Builders.

This ScorerContext is something different entirely.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Are you saying you wouldn't support an updated version of your patch that went back to builder style?

That's correct.

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Are you saying you wouldn't support an updated version of your patch that went back to builder style?

That's correct.

I simply don't suport your patch - deadlock.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

Is there anything people are prepared to compromise on here?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I'd be happy with flags instead of the booleans.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I'm not one for rushing discussions but we're getting close with #2610 which this is a large chunk of. Anyway we can come to an agreement here?

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

It would be nice if we could discuss the builder pattern (and other heated disagreements) without quickly degrading into deadlock. We need to be able to work through our disagreements so we can get back "real" improvements to Lucene and Solr. In this case #2610 is an enormous performance gain.

Net/net I don't like the use of the builder pattern for ScorerContext. It seems like overkill: we only have 3 settings here. Even by the proponents of the builder pattern this is overkill?

I think chained setters are less readable (see LUCENE-2308).

I do agree 2 booleans in a row is asking for sneaky trouble; we can add .setX instead (these fields need not be final)? Or the int flags to the ctor is a great solution too (I think we should do this for FieldType as well).

Net/net plain old boring java object would work fine here.

And, in general, I don't think we should let the builder pattern seep into Lucene, for the simple reason that it's obviously controversial. This is no different from any other controversial change in open-source...

Also, one can always make a shim layer on top of Lucene that exposes builder APis for everything? QueryParser, *Query, *Field/Document, IR, IW, merge policies/schedulers, etc., all could be cutover to builder APIs "up above" right? If we can safely apply the builder pattern "on top", ie it need not pollute Lucene's core, why not do that? We should only make core changes that are not controversial or must be done in core.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think int flags would work well? We only have 2 flags here, plus required acceptDocs.

The vast majority of cases use the default flags:

  new ScorerContext(acceptDocs);

And then for the few places that change the flags it'd be something like this:

  new ScorerContext(acceptDocs, ScorerFlags.TOP_LEVEL);

We keep final-ness for the fields.

This seems like a great solution?

Failing that.... another option would be to just stop using an object here at all, and go back to passing explicit flags down to Weight.scorer.

In fact one benefit of this is we get stronger typing, ie, we force at compilation time all Scorer impls to be fixed to handle the new setting (vs today where a Scorer can easily silently be missed, thus adding latent bug).

This means on any addition to the scorer API (eg I've long wanted for caller to declare up front whether they need scores computed vs "only matching", ie MTQWF and CSQ would pass false), we break the API. But I think that's actually fine, even in 3.x: making your own Scorer is very expert.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas.

While I do agree that it'd be better if we could get past these arguments and make the real changes, you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Lucene went a long time without builder APIs, if you like builders, you can do them in your own code (there is no need for us to enforce such antipatterns)

This API for ScorerContext is an internal API. the intended consumer is a lucene developer (e.g. guys like me). Its not for average joe... especially in 4.x when you can more easily tweak the scoring API, I think very very few users will write custom queries.

We don't need to design APIs to baby lucene committers about this stuff, especially about two booleans, but like I said I wouldn't get too upset about flags (even though personally i think its overkill too).

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Also, given the attitude presented here towards me uploading a patch (threatening to revert my commits no matter what etc), you can be damned sure I am going to be a total asshole about builder APIs.

By taking such a ridiculous stance on this internal API you have sealed the fate of builders across the codebase.

Instead of looking at the actual use case and thinking 'does this need a builder' you assume I'm completely against them: actually I don't have such a blanket opinion: if you look at some classes I have added to lucene you will see that I have even added builders myself... where it makes sense!

But just because a class has two booleans doesnt mean that it needs a Builder: thats my problem, when you use a pattern as a hammer across the board like this then it becomes an antipattern, because its the wrong solution.

In some cases Builder makes sense: I think just not here... but this doesn't matter now. I'm gonna scream about builders now even when they really are a good fit.

Nice work.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I actually agree with a lot of what you're saying here Robert. Using the right approach at the right time is key and here you've definitely made a good argument for why we should use the constructor + boolean approach.

I hope we can continue to have discussions on a case by case basis, about what approaches are best?

asfimport commented 13 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

I'm gonna scream about builders now even when they really are a good fit. Nice work.

Robert, you're blaming Uwe for your own future bad behavior when he threatens to use your tactics against you? Sweet.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I don't like the idea of shying away from a change just because it's controversial. Sometimes its necessary to shake things up with new ideas.

+1 I'm all for pushing new ideas that make good, hard improvements to Lucene (like LUCENE-1536).

you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

I am in fact asking for that, because largely Lucene does not adopt the builder pattern today, the builder pattern is relatively new and trendy, vs Lucene's codebase, and now we see it seeping in, in various places/patches/etc. Not only is it new, it's also controversial amongst the committers.

I think it's also reasonable because you can naturally layer the builder API on top of a simple java APIs, but not really vice/versa. One could create a very nice shim "Lucene builder APIs" library. It need not be in our core APIs.

This way apps that love the builder pattern can use the builder shim; those that don't like them can use the plain java APIs.

As other "popular" patterns try to seep into Lucene, I think we should also take a cautious stance: we should not apply a pattern just because it exists and has become popular; we should see strong technical benefits to Lucene before doing so.

So, stepping back, "adopting the builder pattern in Lucene's APIs" is the overall change I'm talking about, and I think that is a big change.

Also, this API feels to me to be a lot more internal, so whether or not builders could be built on top of more outward facing concepts like QueryParsers, Field/Document etc, seems a different issue?

Right this is an internal API, but for example if you build custom queries/scorers you can still use a builder shim on top of Lucene's core ScorerContext. It could be part of this builder shim library too.

Really "Lucene should adopt the builder pattern" to me is a big change, and it's a codebase-wide, global decision. It's actually very similar to passionate disagreements on whitespace, and this is why we (thankfully) have a hard standard on what our whitespace looks like. Otherwise we'd be having huge arguments about whether parens should be surrounded by whitespace on every patch.

So net/net I don't think Lucene should adopt the builder pattern, for internal or external APIs. Just build a shim library on top.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

+1 Mike. I agree.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

you seem to be asking for those who advocate a builder like API here to compromise and for those who don't want such an API, to not? Thats a tough pill to swallow.

I can swallow tough pills so +1 - I think this pill is going to change a lot for me on how I see this project and how I feel to contribute.

simon

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Failing that.... another option would be to just stop using an object here at all, and go back to passing explicit flags down to Weight.scorer.

This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer?

Like we can look at a compile-time-break as a feature: if we add a new thing (e.g. imagine adding this bitset after the fact), its crap for us to say 'but we didnt break backwards compatibility' when in fact you really do need to change your code!

asfimport commented 13 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

So net/net I don't think Lucene should adopt the builder pattern, for internal or external APIs. Just build a shim library on top.

Let's start with telescopic ctors again, I am a fan of them! And non-final fields are the best I can think of! And don't forget freeze(), we should now freeze all development instead and make our opinions guarded by hotspot bugs.

I will no longer discuss here, I will do something else, more productive, beyond Lucene.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

This is interesting for discussion too: because a compile-time break is "better" than a runtime break I think in cases of changes to query/weight/scorer?

I agree. The APIs of Query/Weight/Scorer feel to me to be so important that we should be very wary when making changes, but when we do want to make changes (and we should) then being explicit that something has changed and that people need to look carefully at the new API, seems beneficial.

At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial?

What would the signatures look like if we dumped ScorerContext?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

At the same time the benefits of the Contexts was that we could more easily make API changes. But perhaps with these classes some reluctance is beneficial?

But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong!

What would the signatures look like if we dumped ScorerContext?

Like Lucene 3.x: IR, boolean, boolean, Bits

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong!

Agreed.

Like Lucene 3.x: IR, boolean, boolean, Bits

Do we have any default value issues or anything with those booleans?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

No, I think the scorercontext was geared at us being able to make changes without breaking the API: After Mike's comment the more I think about it I think this was a bad idea... its being used here like Solr's NamedList hammer...

I think we should keep type safety on the Q/W/S apis to avoid traps... adding Bits like this to a ScorerContext is a great example where we 'break backwards compatibility' in a sneaky way... far better to have a compile break.

So, +1 to go nuclear on ScorerContext completely.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

Can you want to put together a patch to that regard? I'm definitely +1 for the idea at the moment.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

But I think this is a downside here, if we add some new flag (e.g. acceptedDocs) its important for you to change your code. The context gives you a false warm-fuzzy feeling: pretend we already had ScorerContext and we committed this issue: then your custom query continues to work fine, until you use a filter and its silently wrong!

Actually I think this is a serious problem with what we have today?

It's really awful/dangerous if on upgrade, silently, filtering stops working against your custom Query.

It's also awful if we internally mess up and miss a Query that should have been fixed to handle .acceptedDocs, which could easily happen today.

OK I think we should just go back to passing the arguments directly, noting that this API is internal, so that custom queries out there will get a hard compile-time break, not silently get the wrong results, when there's an important change here. It's too dangerous to use a context object.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

sure, I'd be happy too... was gonna wait a little bit to see if anyone grossly objected.

I still don't like the fact this dodges the builder discussion completely, but again my real opinion for the record:

So really my opinion on Builders is no different than my opinion on ArrayList: you use them when its the appropriate solution and when it makes sense... I think thats when you are actually building stuff.

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I think its a good idea to go ahead and put a patch together so we can discuss it directly.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

patch with the type-safe API and nuking ScorerContext