apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Constants#LUCENE_MAIN_VERSION can have broken values [LUCENE-5850] #6912

Closed asfimport closed 10 years ago

asfimport commented 10 years ago

Constants#LUCENE_MAIN_VERSION is set to the Lucene Main version and should not contain minor versions. Well this is at least what I thought and to my knowledge what the comments say too. Yet in for instance 4.3.1 and 4.5.1 we broke this such that the version from SegmentsInfo can not be parsed with Version#parseLeniently. IMO we should really add an assertion that this constant doesn't throw an error and / or make the smoketester catch this. to me this is actually a index BWC break. Note that 4.8.1 doesn't have this problem...


Migrated from LUCENE-5850 by Simon Willnauer (@s1monw), resolved Aug 16 2014 Attachments: LUCENE-5850_bomb.patch, LUCENE-5850_hashcode.patch, LUCENE-5850_smoketester.patch, LUCENE-5850.patch (versions: 9), LUCENE-5860_hashcode.patch Linked issues:

asfimport commented 10 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

for reference here are the affected versions:

maybe there are more While 4.8.1 looks good

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

It is intentional that this isnt changed:

We should never change index format with minor versions, so it should always be x.y or x.y.0.z for alpha/beta versions!
asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

The real bug here is not what simon describes... (although I think its confusing, and its extra confusing there are "two versions").

It is not that 4.8.1 is good, actually 4.8.1 is the buggy one. Segments with 4.8.1 say they were written with "4.8", which is broken.

Furthermore, code like Lucene45DocValuesProducer doing stuff like the below is also broken:

Version.parseLeniently(state.segmentInfo.getVersion());

The main problem is caused by the overengineering of this shit: two different version values, one of which is driven by a system property, and other confusion. Because of this its also not tested. I realize this system property shit is supposed to be there to support "strange" things like custom builds and maven snapshots, but its gotta die. Sorry, its broken for actual lucene releases!

asfimport commented 10 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I agree with robert though. I should have named this issue differently. It confused the hell out of me since I would have expected this to be always 4.x or 4.x.y but for 4.8.1 this value is actually 4.8 and apparently it should have been 4.8.1? But then the comment is broken too. We should really just use Version.java and have constants there for all the release?

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Yeah, this is horrible. 4.8.1 is not broken, because I took care of that before release.

Actually, if we parse this to Version constants, the Lucene Main Version constant in Constants.java should be of type Version. And Version should be able to serialize as String according to spec.

I think in the affected versions, the RM used search replace in an inappropriate way...

The Constants test case should validate the version string with a regex. I think I already added that, so it should not happen later.

The sys prop hell is only partly related but is there to validate that the common-build version fits the constant. At runtime it never parses constants. Lucene never reads the sysprop at runtime. – Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Before doing anything else, we should commit this patch. Its a timebomb waiting to happen that these two places in the code do the wrong thing!

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1613988 from @rmuir in branch 'dev/trunk' https://svn.apache.org/r1613988

LUCENE-5850: use correct version comparator (at the moment) for versions in index segments, otherwise this logic will do the wrong thing on the next bugfix release!

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1614002 from @rmuir in branch 'dev/branches/branch_4x' https://svn.apache.org/r1614002

LUCENE-5850: use correct version comparator (at the moment) for versions in index segments, otherwise this logic will do the wrong thing on the next bugfix release!

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Here is a start at some logic for the smoketester, after it runs the demo, it should run CheckIndex on the index. I modified CheckIndex to output the segment's version, and I think we can change this python script to verify its equal to the supplied version the smoketester is run with???

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch, with changes to smokeTestRelease to find the version in checkindex.log and see if it matches what was specified to its command-line. I strip trailing .0's like StringHelper's versionComparator. I ran it with "ant nightly-smoke" and it looks like it's gotten past this check successfully on 4.x ...

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Yes it should currently pass. But if you point it at release 4.8.1 artifacts it should fail?

http://people.apache.org/\~rmuir/staging_area/lucene_solr_4_8_1_r1594670/

asfimport commented 10 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

The main problem is caused by the overengineering of this shit: two different version values, one of which is driven by a system property, and other confusion. Because of this its also not tested. I realize this system property shit is supposed to be there to support "strange" things like custom builds and maven snapshots, but its gotta die. Sorry, its broken for actual lucene releases!

Agreed: we should absolutely prioritize the stability and correctness of the official releases over any sort of custom builds and/or maven snapshots – but one thing to keep in mind is that a lot of the reason for these "defaults" was not to make it easier for people to do custom builds or maven snapshots, but to help ensure that if someone does a custom build or uses a maven snapshot there is no mistaking it from an official build.

ie: way, way back in the day people would build off trunk, with custom patches and their builds might be named "lucene-1.4.jar" (before 1.4 was ever official) and then weeks/months later (after 1.4 came out) they (or their coworkers) reported bugs or asked questions that made no sense and there would be no end of confusion.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Well, in 4.8.1 CheckIndex doesn't print out SI's version, so smokeTestRelease.py gets upset:

Traceback (most recent call last):
  File "dev-tools/scripts/smokeTestRelease.py", line 1360, in <module>
  File "dev-tools/scripts/smokeTestRelease.py", line 1304, in main
  File "dev-tools/scripts/smokeTestRelease.py", line 1341, in smokeTest
  File "dev-tools/scripts/smokeTestRelease.py", line 637, in unpackAndVerify
  File "dev-tools/scripts/smokeTestRelease.py", line 764, in verifyUnpacked
  File "dev-tools/scripts/smokeTestRelease.py", line 948, in testDemo
RuntimeError: unable to locate version=NNN output from CheckIndex; see checkindex.log
asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

+1 to commit this to the smoke tester. I have the feeling that "total overhaul of versioning to work correctly" will be painful and controversial, but we can at least do this for now.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Agreed: we should absolutely prioritize the stability and correctness of the official releases over any sort of custom builds and/or maven snapshots – but one thing to keep in mind is that a lot of the reason for these "defaults" was not to make it easier for people to do custom builds or maven snapshots, but to help ensure that if someone does a custom build or uses a maven snapshot there is no mistaking it from an official build.

I don't really care about this. The stuff we record in the index needs to be correct for various back compat to work. If we can't simplify the versioning, then we need to simplify our back compat guarantee. But something has to give.

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1614136 from @mikemccand in branch 'dev/branches/branch_4x' https://svn.apache.org/r1614136

LUCENE-5850: fix smoke tester to verify the index's segment versions in fact match the version being tested

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1614149 from @mikemccand in branch 'dev/trunk' https://svn.apache.org/r1614149

LUCENE-5850: fix smoke tester to verify the index's segment versions in fact match the version being tested

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi,

thanks for all the work done already. I am on vacation at the moment, so sorry for not responding in time :-) I was talking to Robert and Simon today via Google Hangouts (my internet was slow during the day). This issue contains several issues, which should be handled separately.

In fact we have actually 2 problems:

The documentation of the Version class is perfectly fine, maybe we should add a not, that this class and parser is only for parsing versions that are user-input from config files. Also the Version class is only for passing a hint to the API, which behaviour you want to have. The Version class is not for file formats and should never serialized or used to switch inside codecs depending on file versions. We can fix two things:

I am happy that Mike added a Smoke tester check, because the test in our code base was not good enough. But we should fix the version test correctly. Now I refer to Robert's complaint about the system property: The system property in the test runner was added to prevent the bug we have. This one passes the -Dversion=... constant down to tests, so we can validate LUCENE_MAIN_VERSION. Unfortunately the "version" sysprop also contains bullshit after the main version, so we just do a "startsWith" check in the test. And this is the bug. If LUCENE_MAIN_VERSION=="4.8" and we pass -Dversion=4.8.1-foobar the test passes, because the sysprop starts with the MAIN_VERSION.

I think the perfect fix can be done since I fixed common-build to have the main version and the appendix separately. "version" is constrcuted in common-build from a prefix and suffix:

  <property name="dev.version.base" value="5.0"/>
  <property name="dev.version.suffix" value="SNAPSHOT"/>
  <property name="dev.version" value="${dev.version.base}-${dev.version.suffix}"/>
  <property name="version" value="${dev.version}"/>

In fact we should change this and pass "dev.version.base" to the test runner and do an "equals" check in the testcase. By that we guarantee that common-build's version is identical to the LUCENE_MAIN_VERSION.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

But we cannot get the minor version for each segment. So we have to decide if we write the bugfix version number to index file (x.y.z.a.b.c... as many as you like). If we do this, we need the validation of LUCENE_MAIN_VERSION (smoker and -> see below).

I think we should: because its the version of lucene that wrote the segment. It is possible we may need to differentiate 4.8.1 from 4.8 for some hackedy-hack in the future because of say, a bug unrelated to the format (perhaps in indexwriter or something). So if we write the correct version we always have the possibility to add workarounds for such things.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I think we should

Then we have to change this code: After the problems with the minor versions appearing in LUCENE_MAIN_VERSION, this test was added - which enforces what the Javadocs about LUCENE_MAIN_VERSION tells:

  public void testLuceneMainVersionConstant() {
    assertTrue("LUCENE_MAIN_VERSION does not follow pattern: 'x.y' (stable release) or 'x.y.0.z' (alpha/beta version)" + getVersionDetails(),
        Constants.LUCENE_MAIN_VERSION.matches("\\d+\\.\\d+(|\\.0\\.\\d+)"));
    assertTrue("LUCENE_VERSION does not start with LUCENE_MAIN_VERSION (without alpha/beta marker)" + getVersionDetails(),
        Constants.LUCENE_VERSION.startsWith(Constants.mainVersionWithoutAlphaBeta()));
  }

If we now use the full version number in the segmentinfos, we have to make LUCENE_MAIN_VERSION be identical to common-build's dev.version.base. I would then change this test to enforce identical values and remove the regexp and replace by a had check on the common-build sysprop in the test (see above). I can provide a patch doing this.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Please do, afterwards we can then work on merging version string logic and Version.java

We need only one version here. Currently the shit is overengineered and does not really work, even committers are screwing it up. To hell with maven builds, custom builds, etc, we have to fix this.

asfimport commented 10 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

We need only one version here. Currently the shit is overengineered and does not really work, even committers are screwing it up. To hell with maven builds, custom builds, etc, we have to fix this

I'm too high level on this to fully understand atm, so excuse my laziness, but can't we still basically support custom builds even with only one version? Can't they just override this one version?

Even the ways things are now, we are working around issues with our custom build and replacing the version. I've had to comment out a few of those version check tests. I've talked about it with Robert before and it did all just seem to be kind of silly. Glad it's all getting overhauled.

Just trying to understand why custom build would be too difficult anyway. I mean, perhaps you have to make a small hack to the ant file or something, but that's fine.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Just trying to understand why custom build would be too difficult anyway. I mean, perhaps you have to make a small hack to the ant file or something, but that's fine.

Its currently attempted to be supported and thats why versioning is a mess (there are multiple versions) and nobody understands what is going on. We have to just simplify it: its open source. If you want to do something custom, my god you might have to change the source :)

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

There is nothing complicated anymore. As I said, I can fix the test. We already have splitted the common-build.xml version numbers in the base version and the appendix. The testcase just has to check if the base version is identical to the LUCENE_MAIN_VERSION constant. We are on the safe side then. The user can change the appendix as he likes (jenkins does this at the moment). The problem is just our test configuraton, I will provide a patch once I am back at home, just give me one or 2 days :-) It is all supported, just the testcase is bullshit - and I will fix this.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

There is nothing complicated anymore.

I disagree, I see:

To me this is COMPLICATED!

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

To me this is COMPLICATED!

I was talking about the test. Please let me fix it tomorrow, I just have no time at the moment. spec.version is already obslete its just a relict. The stuff with suffix and base was introduced to make handling the version easier, it was just not used to unfuck the test.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

dev-version and version are also duplicates. I had this on the agenda to fix.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi,

Please note: The problem mentioned in the original issue description ("Constants#LUCENE_MAIN_VERSION can have broken values") was not a problem anymore because of #6600 (since 4.8). The original documentation said that LUCENE_MAIN_VERSION should only be in format x.y but because of the missing test introduced in #6600, we had some outlyers in the past. 4.8.1 was therefore correct with LUCENE_MAIN_VERSION="4.8".

In this issue we now decided that we should record the full (numeric) version number (without appendinxes) in index segments. Smoke tester already checks this now, but the tests were not confirming (still enforcing x.y format from LUCENE-5537).

I worked on a patch that cleans up the version handling in Constants.java/Version.java and common-build.xml. It also contains improved tests:

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

To support ALPHA/BETA versions (like Robert introduced in 4.0-ALPHA), you can set a boolean to true in Constants.java. By that it automatically appends ".0.1" at the end of the MAIN_VERSION.

But thats not what we did. Alpha, Beta, and Final all had 3 distinct version numbers.

I changed Version.parseLeniently() to allow minor bugfix versions given (so users can write "4.9.1" into their config files, but its parsed to LUCENE_4_9), which would be the corresponding match version constant

Why not just merge with Version.java to eliminate yet-another-version? We still have too many, and this is still asking for bugs, just different types of bugs. We can just add the previous release constants to the enum.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

But thats not what we did. Alpha, Beta, and Final all had 3 distinct version numbers.

I know, this was just a quick hack to support maybe the same in Lucene 5.0. The idea of the ".0.1" was just to prevent people from using indexes created in beta versions with the released version. So the ".0.1" was perfectly fine to enforce this. But it did not help once 4.1 was out. ndexes created with 4.0.ALPHA did not fail if read with 4.1 (only with 4.0)! In my opinion, we should mark BETA indexes in a different way with LUCENE 5.0. This is why I said this should be reviewed. We should have a marker "ALPHA/BETA" in the index file and fail to read it, This is unrelated to this patch - I would be happy to remove the whole ALPHA/BETA checks. To emulate your behaviour, we could remove

Why not just merge with Version.java to eliminate yet-another-version? We still have too many, and this is still asking for bugs, just different types of bugs. We can just add the previous release constants to the enum.

I disagree. Version.java was and is never intended to be used to specifiy real Lucene versions. Version is there to enable backwards compatibility hacks when passing in as constant to enable backwards compatibility hacks. We also remove older constants!

Elasticsearch and Shai Erea were simply doing the completely wrong thing. Version constants are there to enable backwards layers. parseLeniently was there to allow users to specify. Why does Elasticsearch not simply print the fucking String from the segment metadata? Sorry, what is described in the issue and the fix in ES 1.3.1 is bullshit^3!

The parseLeniently hack added here was just for convenience. I should remove it completely - it was just here to help Elasticsearch! Sorry - will go away in next patch.

In Lucene we just have 2 versions: Constants for reporting metadata and save it in index files. The Version enum is for enabling backwards hacks in Solr's and ES's config files. The parser was added to help Lucene users to pass those constants in config files.

The big question to @s1monw: Why does ES not report the real version number as string in the welcome message of its root rest endpoint???

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I disagree. Version.java was and is never intended to be used to specifiy real Lucene versions. Version is there to enable backwards compatibility hacks when passing in as constant to enable backwards compatibility hacks. We also remove older constants!

Elasticsearch and Shai Erea were simply doing the completely wrong thing...

Of course, but the version "system" is so complicated that nobody here can even seem to agree about what should be written in the index.

This is a big fucking problem.

I'm telling you one thing: we are going to simplify this versioning, or we are going to simplify (aka reduce) our back compat guarantee. I'll be the first to opt out of providing back compat: its a feature that people who care about can contribute.

I'm ready to go this route if you cannot simply agree that this is all a confusing mess.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Can we move the ranting here to a more funny fight on the mailing list?

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think its extremely relevant to the issue, just look at the discussion above.

Various confusion about whether "4.8" or "4.8.1" should be written into the index for a bugfix release, buggy usage of version API #9 instead of version API #8, etc.

Its a real problem that I think we should address.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Robert, I fully agree! - this is why we opened and resolved issue #6600. We just had a few broken releases with respect to this.

We should now clean up our version handling, do we agree on this? This is what I tried to do in this release.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

We do agree on that. We also need to simplify it though. Having multiple ways to compare versions (e.g. Version.compare vs StringHelper.versionComparator) etc is really bad.

I think its equally bad to "downgrade" 4.8.1 to 4.8 because we have two apis (versus having one unified enum api). Its still buggy if you do the wrong thing, so if previous mistakes are repeated again, like the ones that motivated this issue, it will still be equally buggy, just buggy in a different way.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

OK, so in my opinion we should resolve this issue, because it is a duplicate of #6600. The problem @s1monw complained about was already fixed there. We should create a new issues about:

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I don't think the real problem (confusion) is fixed yet. Seems to me that making a new issue will just make it harder to follow the discussion.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Then we should change issue title and description!

Constants#LUCENE_MAIN_VERSION is set to the Lucene Main version and should not contain minor versions. Well this is at least what I thought and to my knowledge what the comments say too. Yet in for instance 4.3.1 and 4.5.1 we broke this such that the version from SegmentsInfo can not be parsed with Version#parseLeniently. IMO we should really add an assertion that this constant doesn't throw an error and / or make the smoketester catch this. to me this is actually a index BWC break. Note that 4.8.1 doesn't have this problem...

This was fixed in #6600 because it enforced LUCENE_MAIN_VERSION to be "x.y" or "x.y.0.z" via regex. The reason why @s1monw opened the issue was, because he was hurt by the broken releases before the added check.

The new issue is about the above 2 things:

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Right, to be clear, here we have calmed the symptoms (the back compat time-bomb bugs in SegmentReader and codec). But to me this isn't enough, i am afraid of what would happen if we released such code in a bugfix release. We need to cure the disease, and prevent such bugs from being easy to do.

Additionally we should probably change the release instructions to generate back compat index for every release bugfix or not, index format change or not, and retroactively generate them for all 4.x releases. This would provide additional protection.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I have seen, you are going to remove Version.java completely in #6921, so I will not touch Version in this issue.

I will rewrite the patch, to clean up common-build.xml and use the full bugfix version when writing index files.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I am happy also if Version is the "one thing" with the comparator and used for the index... (with all constants for all releases).

And not mandatory for analyzers.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

New patch:

I think this is ready as a first step. Cleaning up "Version" class (removal) can be done in the #6921.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Improved the common-build.xml checks. Also cleraly state where the RM has to take action!

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Small updates after chat with Robert: Improve documentation

There is a small change now for release managers: As we now write the full version number into index files (including bugfix), after branching a release we have to update common-build.xml and Constants.java to have the bugfix number already. This is needed to have LUCENE_MAIN_VERSION already contain the bugfix number. This is checked by the common-build file. So you cannot release "4.10.0" without editing common-build.xml to be valid. The only difference between releases and other builds is the version suffix "SNAPSHOT".

I will update the release manager instructions after this is done.

For Robert: _THIS PATCH DOES NOT HANDLE Version.java MERGE. IT JUST UPDATES TO NEW LUCENE_MAIN_VERSIONSEMANTICS (full bugfix) and improves documentation, so problems like ES and Shai's error cannot happen anymore.

asfimport commented 10 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks for fixing this bug Rob ... "sorry" is really all I can say. I hope I didn't cause all this shit... Maybe we should also document SegmentInfos.getVersion() to use StringHelper if you want to parse it?

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Why should you feel sorry? I dont think its your fault.

I blame the overly complicated versioning, thats why i was arguing to simplify it.

If you have N version strings and M comparators and each comparator only works correctly on a subset of the version strings (on the others, it gives exceptions or wrong answers), then its just asking for bugs.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Patch with Shai's suggestion.

asfimport commented 10 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks Uwe for adding the docs, they are crystal clear.

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

How should we proceed with this.

I would suggest to commit this for now, so we can then work on extending Version.java? This patch mainly restores Lucene 4.x and trunk into a releaseable state. The current one, would not work for minor version releases, because the tests would prevent this from working (the current code in trunk/4.x does not allow minor versions in LUCENE_MAIN_VERSION).

So should I commit this to 4.x and trunk

BTW: Lucene 4.9.1 would be releaseable, as the tests are enforcing LUCENE_MAIN_VERSION to be "4.9", not "4.9.1", so the problem Simon mentioned is not there. The additional smoketester check was not committed to 4.9 branch, so this would not break 4.9.1 release.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

4.9.1 is not releasable. This issue needs to be fixed first to make it so (and several things backported, like fixes to SegmentReader).