apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Code cleanup from all sorts of (trivial) warnings [LUCENE-2285] #3361

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

I would like to do some code cleanup and remove all sorts of trivial warnings, like unnecessary casts, problems w/ javadocs, unused variables, redundant null checks, unnecessary semicolon etc. These are all very trivial and should not pose any problem.

I'll create another issue for getting rid of deprecated code usage, like LuceneTestCase and all sorts of deprecated constructors. That's also trivial because it only affects Lucene code, but it's a different type of change.

Another issue I'd like to create is about introducing more generics in the code, where it's missing today - not changing existing API. There are many places in the code like that.

So, with you permission, I'll start with the trivial ones first, and then move on to the others.


Migrated from LUCENE-2285 by Shai Erera (@shaie), resolved Feb 27 2010 Attachments: LUCENE-2285.patch (versions: 3), LUCENE-2285-remaining.patch, LUCENE-2285-remaining+generated.patch Linked issues:

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Can someone please clarify these for me:

Description Class Line
Unsupported @SuppressWarnings("SerializableHasSerializationMethods") TestCustomScoreQuery.java 87
Unsupported @SuppressWarnings("SerializableHasSerializationMethods") TestCustomScoreQuery.java 123
Unsupported @SuppressWarnings("UseOfSystemOutOrSystemErr") TestFieldScoreQuery.java 42
Unsupported @SuppressWarnings("UseOfSystemOutOrSystemErr") TestOrdValues.java 37

Are these meant to be there and eclipse just doesn't recognize them for some reason, or are these a mistake?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I'll create another issue for getting rid of deprecated code usage, like LuceneTestCase and all sorts of deprecated constructors.

At the moment this is not simple, as there are LocalizedTestCase which is not easy to transform to JUnit4. In my opinion, the deprecation of the TestCase base class should be reverted. And also most people here (inlcuding me) are not willing to add these stupid @Test everywhere when writing tests, so we only transform testcases that would speedup with @BeforeClass, @AfterClass (creating a read-only index only one time per class, e.g. see NRQ tests).

About the @SuppressWarnings: These are some from other compilers, a Java compiler is required to ignore all unknown ones, so they dont hurt. But for cleanness, we only want to have Sun javac compile annotations and of course @Override (not for interfaces, Java 1.5!) + @Deprecated.

Another issue I'd like to create is about introducing more generics in the code, where it's missing today - not changing existing API

Which places do you mean? The public apis are 100% generics, maybe some internal parts. Some additional requirement: Please avoid autoboxing!

If you cleanup the code use in all cases the Eclipse code style from the Wiki (contributor page). We have an updated one with generics support!!!

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

.. not willing to add these stupid @Test everywhere

I don't share the same feeling ... I think it's a strong capability - write a method which doesn't need to start w/ testXYZ just to be run by JUnit (though I do both for clarity). I think moving to JUnit 4 only simplifies things, as it allows testing classes w/o the need to extend TestCase. But I'm not going to argue about it here, I'd like to keep this issue contained, and short. So I won't touch the LuceneTestCase deprecation, as it's still controversial judging by what you say.

I'll remove those SuppressWarnings then?

About generics, there are the internal parts of the code, like using List, ArrayList etc. Scanning quickly through the list, it looks like most of the Lucene related warnings are about referencing them ... so it should be also easy to fix.

I'll take a look at the code style settings (http://wiki.apache.org/lucene-java/HowToContribute?action=AttachFile&do=view&target=Eclipse-Lucene-Codestyle.xml?), but I'm talking about compiler warnings.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I don't share the same feeling ... I think it's a strong capability - write a method which doesn't need to start w/ testXYZ just to be run by JUnit (though I do both for clarity). I think moving to JUnit 4 only simplifies things, as it allows testing classes w/o the need to extend TestCase. But I'm not going to argue about it here, I'd like to keep this issue contained, and short. So I won't touch the LuceneTestCase deprecation, as it's still controversial judging by what you say.

This was discussed lots of times in JIRA and frenode IRC (#lucene). The important thing is: We want all testcases in lucene to be extended from one class (if Junit4 it's LuceneTestcaseJ4), because we have some additional checks that should be run before/after each test: FieldCache checkups, merge scheduler tests, reproducing random test seeds,...

I'll remove those SuppressWarnings then?

Yes, I remove them only whenever i see them.

At the moment, such code cleanup and if they also affect whitespace cleanup, have the problem that they make merging with the new flex branch (flexible indexing) harder, so the most important thing is to not simply reformat all the code. When adding generics, please use the above eclipse codesyste, as we don't want to have whitespace after commas inside generics.

About generics, there are the internal parts of the code, like using List, ArrayList etc. Scanning quickly through the list, it looks like most of the Lucene related warnings are about referencing them ... so it should be also easy to fix.

Last time I opened Lucene inside Eclipse only produced one type of warning: "Use of raw type" - but this warning only affected instanceof checks. This is stupid and a bug in Eclipse. Instanceof checks can never be generic, as the tests is useless at runtime. So a check for "instance of ArrayList" without generics is perfectly fine. Adding generics make you feel that there is some compiler check or runtime checks, both of them are not done. Instanceof is runtime-only and can never check any generics. So I am against those checks! But if you find some unneeded casts and missing generics in general, +1.

The instanceof warnings can somehow be switched off in Eclipse, dont know how. For coding i dont use IDEs (only for automatic refactoring).

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I forgot: Lucene should compile with javac without warnings - and this is working.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

How about if I undeprecate LuceneTestCase for now, and if there will be a decision to remove it, then we'll remove it? It will eliminate lots (!!!) of deprecation warnings.

About the generics - any reason why not have a space after commas inside generics?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

About the generics - any reason why not have a space after commas inside generics?

See the mailing list. And it is done like that everywhere in current luecen's source tree. And you have no problem, just use the codestyle xml file for the project and you are done.

http://www.lucidimagination.com/search/document/62fe00098351dbe3/whitespace_inside_generics_parameters

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

See the mailing list.

What does it mean? What should I look for? Is it related to undeprecate LuceneTestCase? If so, can you give me the short answer - yes/no?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

What does it mean?

Sorry, I chnaged my comment above.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok I'm fine w/ the formatting. What about un-deprecating LuceneTestCase for now?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Ok I'm fine w/ the formatting. What about un-deprecating LuceneTestCase for now?

+1, I wanted to do this already, but forgot about it.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Uwe, what should I do w/ Version.LUCENE_CURRENT? It is deprecated, however lots of tests are using it. Do they need to reference LUCENE_31? What will happen in future versions? Every release we'll change all the tests? I remember a discussion about this, just trying to figure out how to change the tests now.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

There is an issue open about that, ignore it please, Simon Willnauer will repair that. You have to use LuceneTestCase.TEST_VERSION_CURRENT, as all tests extends one of LuceneTestCase/-J4, it is simple to refactor using "sed".

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

By the way, there are lots of tests, that are explicitely testing deprecated APIs, so warnings are fine. We know about compile warnings there, but do not change them to work differnt (how should we?) - we can only add @SuppressWarning("deprecated") to them. I would suggest, that you ignore tests and only look at the main class files.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks for the concerns Uwe, I've noticed the tests that test deprecated code and I know better than to try to fix them ... uploading the patch now

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Quite a large patch. I've started off with 3832 compiler warnings based on my eclipse settings and we're now down to 510. All tests pass, including core, contrib and tag. I've also fixed a bunch of javadocs warnings, and "ant javadocs" now passes cleanly. I did not do any formatting to the code, in order to preserve the patch as clear and focused as possible, even though it's a very large one ...

It touches a lot of files. So the sooner someone can help me commit it the better (before these files change).

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I will look into it, some of the changes are problematic because they appear in generated classes (like QueryParser), so i will leave that out. Also @SuppressWarnings("unused") is not a javac annotation. Thanks for fixing the javadoc warnings and cleaning up some import statements. As far as I see, you did duplicate work and replaced all LUCENE_CURRENT constants in tests, so I may close the other bug report when committing this, too.

This may take some time :-)

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

I applied the patch to my checkout, so it will not get out-of date. As mentioned before, I have to review each change, as on my first diagonal look-around I found a removed cast in TestCharArraySet/Map that is important to call the right method, without the cast the test would pass, but the affected method is never called. I am also not want to remove some casts in NumericRange and other parts, where the casts were added for more clearness in code. Especially at some places without the cast it is not clear what javac will do, so the cast is for more "security" even if not needed.

So please excuse by complaints, but two people looking over such a large patch is really needed.

Thanks for the work! Uwe

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

some of the changes are problematic because they appear in generated classes (like QueryParser),

So? All I did was remove unnecessary semicolons and casts ... next time those files will be generated, the warnings will return. But at least until then we can live w/ few less warnings ... which will allow my perfectionist eyes to rest a little :).

replaced all LUCENE_CURRENT constants in tests

Yes, I figured I'm already touching these files, let's do it all at once. Reduced another \~300 warnings.

About the removed casts - eclipse really warns you on unnecessary casts. I have never found a case where it was wrong. The removed cast from TestCharArraySet is justified because you want to test the contains(Object) method, which is exactly what happens. In fact, when I look at the code, I think there is a wrong cast:

    assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo")); // invokes the contains(Object) method
    assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray())); // invokes the contains(Object) method
    assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray(),0,3)); // invokes the contains(char[], int, int) method

If the intention was to check all 3 contains methods, then the first cast should have been to CharSequence? Anyway, the removed cast (the second, which cast to Object) is justified as it's indeed unnecessary.

Also @SuppressWarnings("unused") is not a javac annotation

Are you sure? I have another project which compiles w/ javac and it works fine. I'll check it, but I trust you :).

About adding casts for clarity of code - I guess that's a matter of styling, but the cast is truly unnecessary, and just produces a warning. I would like the code to be free of those, but that's only my opinion.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

The removed cast from TestCharArraySet is justified because you want to test the contains(Object) method, which is exactly what happens. In fact, when I look at the code, I think there is a wrong cast:

This is not true:

Index: src/test/org/apache/lucene/analysis/TestCharArrayMap.java
===================================================================
--- src/test/org/apache/lucene/analysis/TestCharArrayMap.java   (revision 916146)
+++ src/test/org/apache/lucene/analysis/TestCharArrayMap.java   (working copy)
`@@` -76,7 +76,7 `@@`
     int n=0;
     for (Object o : cs) {
       assertTrue(cm.containsKey(o));
-      assertTrue(cm.containsKey((char[]) o));
+      assertTrue(cm.containsKey(o));
       n++;
     }
     assertEquals(hm.size(), n);
Index: src/test/org/apache/lucene/analysis/TestCharArraySet.java
===================================================================
--- src/test/org/apache/lucene/analysis/TestCharArraySet.java   (revision 916146)
+++ src/test/org/apache/lucene/analysis/TestCharArraySet.java   (working copy)
`@@` -475,7 +475,7 `@@`
       assertFalse(CharArraySet.EMPTY_SET.contains(stopword));
     }
     assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo"));
-    assertFalse(CharArraySet.EMPTY_SET.contains((Object) "foo".toCharArray()));
+    assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray()));
     assertFalse(CharArraySet.EMPTY_SET.contains("foo".toCharArray(),0,3));
   }

The problem here is: We have a char[] and a Object method. The check tests that also the Object method accepts char[]. This is important if you cast CharArraySet to java.util.Set and call with char[]. So removin the cast for this test is simply wrong. You can check this with Clover. And you patch even adds the same check two times - instead of forcing the right method.

And by the way: When I run "ant" and it compiles I get no warning message at all.

Are you sure? I have another project which compiles w/ javac and it works fine. I'll check it, but I trust you .

As I said before, java compiles need to simply ignore unknown SuppressWarnings (see Java Language specs). It's only Eclipse that is to over

About adding casts for clarity of code - I guess that's a matter of styling, but the cast is truly unnecessary, and just produces a warning. I would like the code to be free of those, but that's only my opinion.

Yes its a matter of styling, because of the same we don't want to have autoboxing in internal lucene code, because autoboxing has a speed impact for some of Lucene's code (like collectors). So we want to see what happens.

I want to understand that I apply a char -> int conversion, especially in our new TokenFilters you can get a problem very fast as Character-methods are very sensitive if called with char or int from the Unicode part. And I say it again, javac shows no warning, so I don't see any need to change this, just because this Eclipse prints a useless warning. But you can switch them off. I have some warnings i simply swicth off after creating a project with eclipse. Like the problem with generified instanceof checks, which are a bug in Eclipse.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Uwe, your examples are still wrong. CharArrayMap has 3 methods: containsKey(CharSequence), containsKey(Object) and containsKey(char[], int, int). There is no contains(char[]). Therefore when you cast to char[], the Object method is the one that's called, not that char[],int,int.

If I change the code to: assertTrue(cm.containsKey((char[]) o, 0, ((char[]) o).length )); then the right method is invoked. So I guess the tests were defected in the first place .. and like I said, eclipse doesn't lie when it says a cast is unnecessary, at least I haven't seen such a case yet.

I'll fix those two tests now, because they were defective right from the beginning. Thanks for spotting this, because you've just revealed a bug which existed in the tests :).

because of the same we don't want to have autoboxing in internal lucene code

I don't see how autoboxing is related to casting ... If a map returns an Integer, and you assign it to 'int', then whether or not you'll do the cast it will autounbox it. If you assign it to an Integer, then you won't be able to cast to 'int' (I think?) and hence the cast is redundant as well.

About Character methods, eclipse is smart enough to detect that when you call a method w/ a char type, then the right one is called, vs. if you call it with the int type. Hovering over the method call reveals immediately the method variant that's called. So I see no reason why a char would be need to cast to (char). If you want to call an int variant method, then you'll need to cast to int, and eclipse won't complain about that.

Switching off compiler warnings in eclipse is your choice ... the Lucene code is full of 'hidden' casting because that's how Java works. When you do 'int' * 1.0, it's cast to double, and people are aware of that ... in fact, they have to assign the result to a double, or they'll be forced to cast to anything else. When you work w/ integers and the method returns a long, it's cast automatically. So if there are few cases where you'd want to alert the user, put it in a comment, or like int charint = /(int)/ c;

Like I said, it's a styling issue. I'm not going to turn off my warnings in eclipse ... and I don't understand what's this 'generified instanceof checks" - can you give an example?

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Fixes TestCharArrayMap/Test original bug.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Like I said, it's a styling issue. I'm not going to turn off my warnings in eclipse ... and I don't understand what's this 'generified instanceof checks" - can you give an example?

http://javanotepad.blogspot.com/2007/09/instanceof-doesnt-work-with-generics.html

So if you try to do "if (list instanceof ArrayList<Type>)" it throws compiler error. But my Eclipse here complains at about 400 places that there are missing generics in these instanceof checks. Instanceof checks are always raw and can never use generics (it makes no sense). Example: Open eclipse and look into warnings: CharArraySet.equals() uses raw type. Javac does not complain, but Eclipse wants to fix with: "if (map instanceof UnmodifiableCharArrayMap<?>)" - this is bullshit and helps nobody. The Java language spec and Sun's Generics Howto clearly say, that instanceof checks should use raw types.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks for the info Uwe. I don't have this setting turned on and I agree it's not necessary, even though it might be useful. So we are on the same page. I didn't fix any such warnings and I don't see any such warnings ...

Did you look at the updated patch I uploaded? Do you agree/understand the problem those tests had before and why the cast is indeed redundant?

Could you also let me know what parts of the patch you decided to omit? I've spent a lot of time clearing those warnings and I'd hate to see a large portion of them come back for no good reason.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

hello, i wanted to give my opinion on a few things:

I only have one real concern (I am not a generics policeman like Uwe so please continue with him): i like the removal of casts, etc: but i would prefer if we did not do this with auto-gen'd snowball code.

the only reason is i am trying to track changes over there, against over here. it is true we changed a few things in the base class (SnowballProgram) so that millions of reflections/string creations do not happen, but these are all listed at the top along with the svn rev we are using. The Among, and gen'ed code is actually now directly synced up... for now.

In truth i hate modifying this code as I hate the idea of us diverging from their codebase, but i think the performance gains (no reflection, creating many stringbuilders and strings, etc) are worth it. The problem is, I don't think our local modifications are best overall for other uses of snowball, or i would submit them to that project and sync that way. For example, they would rather create a new StringBuilder for every input word, than run the risk of the stemmer "keeping reference" to a potentially large string, for us this is not a good tradeoff.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think the cleanup here is great (thanks Shai!). We don't get enough patches cleaning things up, so, they are always welcome...

And the fact that Eclipse detection of an unecessary cast lead to catching the bug in TestCharSet/Map (not calling the method we thought it was calling) is also great.

The move to Junit4 appears to be controversial so I agree we should handle it separately.

I don't think difficulty merging to the flex branch really applies here... we should try hard not to allow the existence of flex branch to make life harder/slower on trunk. We (well, Mark, Uwe, Robert: thanks!) merge huge changes over to flex every so often. And I'd like to land flex soonish (hopefully) so we don't have to do that anymore...

Probably we shouldn't change generated code unless really necessary? I think it can just cause confusion down the road when someone (not us) generates and then is baffled why all these new Eclipse warnings appeared? In fact, I think, ideally, our build would always regen all gen'd code. This way if we screw up by accidentally editing only the gen'd code and not the source (which I've done, at least 2 or 3 times!!), we find out quickly... it'd keep us honest.

I think it doesn't matter one way or another if we do the TEST_VERSION_CURRENT cutover here or under the existing issue. Looks like Uwe will be committing both, anyway ;)

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

I will review your patch over the weekend and commit all changes except:

The TEST_VERSION_CURRENT changes will be committed in a separate patch. After I had done this, i will post an updated patch here.

It this an option?

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Robert - I actually thought about whether or not to change the Snowball code. In my internal project, I also make use of Snowball code directly, and improved it to fit better in the analysis process. I should actually diff my changes w/ yours, perhaps I can use yours, or contribute mine. But all I did is get rid of using deprecated methods. I didn't remove any casts (I think?), but actually added ones in order to not use the deprecated API.

But from what you say I understand why you prefer to leave the code as-is. It'll make comparison w/ the source Snowball easier. So I'm fine w/ leaving this part out. Perhaps we can just add a SuppressWarning to not see the deprecated warnings? That should be fairly easy to skip over when comparing the sources in the future.

Uwe/Michael - your reasoning about QueryParser makes sense to me. I have a Tokenizer (well few actually) generated by JFlex and I clean up warnings as well. But that's me, where I have full control over the code (and I don't mean just commit rights) - if anything gets rebuilt, it's because I've decided to do it. Therefore I'm ok w/ leaving these out. I think what I've added to QueryParser though is a general SuppressWarnings above the class declaration (in the .jj file). That shouldn't be left out right? I mean, what problems can it cause? But if you feel otherwise, I won't stop you :).

TEST_VERSION_CURRENT - I think if it's already included in that patch why not commit it here? I'm not looking for credit or anything, just to save Uwe's work. Separating it out from this patch and including it over in the other issue is just extra work ... but Uwe - it's your time that's spent, so I won't tell you how to manage it ;).

Uwe, about the 'important' casts, my preference is to include a suitable comment. I.e., if you have a code that looks like long val = (long) otherval just to avoid overflow, then that's not clear anyway. Someone can still change it to int val = otherval w/o knowing/understanding that he just broke something. These types of casts are anyhow redundant as I don't believe someone will change them. Nowadays, w/ 64-bit machines, compilers and OSs, keeping your stuff as long does not matter much (I mean as variables, not as elements in an array).

It'll be interesting to see how this patch turns out eventually. All I cared about is reducing the number of warnings. If we can keep the Snowball/QueryParser stuff under a SuppressWarnings annotations, then that will do the trick as well :).

Thanks everybody for looking into this !

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

here is the patch with the above changes. I also improved some more inconsistencies in Test code. Thanks for removing all this useless null checks after "new". In contrib/QueryParser's Attribute's equals(), I removed the useless null check at all (you only exchanged the clauses), as "null instance AnyClass" always returns false, so the check is useless and for equals() which should be optimized, not needed.

I left the TEST_VERSION_CURRENT changes in! Will close the other issue (#3327) after commit.

At some places I left the casts, but only where I want to have them in all places.

After committing I will provide a diffed patch of the rest of your changes for more review (so they do not get lost). I will commit this in a day.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Uwe, I've applied the patch and in Analyzers I see lots of compile errors due to unrecognized Version.LUCENE_CURRENT (there is a missing import). I thought that you let TEST_VERSION_CURRENT in, no? I changed all the contrib the tests to extend from LuceneTestCase and use it. Perhaps you mistakenly left it out? Actually all errors I see are related to Version unrecognized.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

I applied the patch to a fresh checkout and get no compile errors. Are you sure that the patch applied correctly? I am working in Windows, so if you are not using a patch-apply tool like TortoiseSVN that can accept windows line endings, you have to maybe use dos2unix before?

And don't forget to update your package in Eclipse (press F5). I had this type of errors very often because Eclipse caches the sources.

All tests pass here and no compile errors, also in demo webapp and so on (using ANT).

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Maybe the patch is also outdated, i goes against: Revision: 916685, maybe you can downgrade your checkout using "svn up -r916685", patch and upgrade again.

I use TortoiseSVN's TortoiseMerge patch tool, which is more intelligent and also applies very old patches wizthout problems. It works like the new svn patch in svn 1.7x trunk: It uses the revision numbers in the patch's file headers and fetches automatically the requested version, patches it and then updates it to the version of your checkout. By that it makes use of the standard update tools of svn and patches always apply without any moved HUNK problems and so on.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok I see the problem now - because there are so many files, I didn't see while applying the patch, that there are errors (mismatch) on some files, therefore the patch wasn't applied to them, hence the compile errors. I apply the patch w/ eclipse. The problematic files are tests in the analyzers package, and I suspect it's an encoding issue. My source encoding is UTF-8, yet still when I apply the patch I see different characters on the source and patch file. Not sure where the problem is. The patch file which I downloaded is UTF-8 as well, and TestArabicAnalyzer (for example) contains the correct characters in Arabic (in both the patch file and my checkout copy). Yet still when I apply the patch eclipse doesn't read it as UTF-8 ...

Uwe, how about if we do this issue in multiple commits? I.e., commit what you've done so far (which is what we agree on), then I can update, review the remaining warnings and we continue from there? Anyway after you commit this there will be warnings, and over time more warnings will creep in, so we'll need to do some cleanup again :). Is that acceptable?

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Strange ... something's wrong w/ eclipse and how it read the patch file? I tried to apply the patch which I created originally (and was on my computer, not downloaded from JIRA) and it fails on the same files ... any ideas?

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Googling around I see it's a known problem in Eclipse w/ no solution yet (at least I haven't come across one). Uwe - can you proceed w/ the commit like I suggested and then we review the remaining warnings?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Strange ... something's wrong w/ eclipse and how it read the patch file? I tried to apply the patch which I created originally (and was on my computer, not downloaded from JIRA) and it fails on the same files ... any ideas?

No, sorry. I don't use Eclipse at all, only for some refactoring.

I will commit the patch at the weekend and then update a second svn tree with your old patch applied. "svn up" then makes it able to provide a patch with the left out changes, we don't want to apply (some casts, sorry, and we won't fix them). I just say it again: it compiles without any warnings using javac, we cannot support stupiud warnings of other IDEs during our development, as Eclipse is no official requirement. So I still strongy suggest to disable some of the warnings already mentioned.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok, perhaps you misunderstood me. I suggested to commit your version of the patch, and then afterwards we can discuss the remaining warnings that are controversial. We both agree on your version. We disagree on the diff, right? So let's start w/ yours, and then we can continue arguing about the rest :).

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I understood that, I just wanted to say that some warnings will reappear with my patch, because I removed lots of generated code. Thats all I wanted to say :-)

Sorry, I don't want to nitpick - this issue is somehow about different opinions on code style and warnings - e.g. i totally aggree with renaming private vars that hide protected ones and so on. I also want to fix generics (I am the "official generics police" g). But i simply want something in the code that explains the code better and prevents future errors, even if it is a cast too much. :-) I also applied the unused variable fixes, although in test I think its better to just add a "fake" local variable and place a @SupressWarning("unchecked") before it (you cannot apply this annotation to simple statements). So your compiler complains about unused variables - but how to fix that without placing the SuppressWarnings before the method? Which is bad, as I want to only place it before one code line. In TestVirtualMethod, I fixed this by splitting the bigger test method into two smaller ones, only one with SuppressWarnings. But I still prefer to assign to a local unused var to be able to place the annotation before (maybe thats a bug in Java5, that you cannot add annotations to single statements). Maybe do some "fake" operation on the variable like an assertion to mark it "used" g.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok then we understand each other. Indeed I have a different opinion about casts. They are called unnecessary because of a reason. When a method declares it returns an int, there no reason to cast a char to an int. The compiler will do it for you. More than that, if the method will declare it returns a long in the future, the cast will generate a compile error.

Styling like that always generate lots of opinions :). We shouldn't however limit ourselves to only two opinions. The fact that you are not using Eclipse, and therefore don't see all the warnings, doesn't mean the rest of us who do use eclipse should see them. If they are justified then ok, but otherwise, saying 'javac does not complain' is just not enough for me. Eclipse is where I spend a large portion of my day in ...

So I suggest we get more opinions from others. It's not just about what you or I think ... but we can do this after the larger portion of the warnings is removed.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

When a method declares it returns an int, there no reason to cast a char to an int. The compiler will do it for you.

I aggree with that, those casts were left in the patch, no problem at all. For me it is a problem esp. in some file handling methods that use longs, but sometimes also use ints (e.g. when reading a block of data). One example: MMapDirectory had a lot of problems with integer overflows in the past. The problems occurred because some formulas were simply not using casts at all, so the compiler did what was in the specs, but which is not always easy to see. So if you explicitely add a cast to (long) in your formula you are fine and really nobody gets hurt. An everybody understands whats happening. The Sun Code formatting guidelines explicitely says that, that you should use integer casts, if it is for more clarity. And if you dont like the warnings, then switch them off for only lucene in eclipse (you dont need to do that globally).

I dont agree with using char inside a method when calling other functions without casting to int. E.g. we have some backwards compatibility layer for Unicode 4 that uses Character.toUpperCase(int) and Character.toUpperCase(char) in parallel. And these two methods differ, so i explicitely cast to be sure, which method is called (that was especially important (for Lucene 2.9) in Java 1.4 when compiled with Java 5 - because the javac could use the wrong method without casting to char (even with -source 1.4 -target 1.4) etc. For easy merging to 2.9 (as it is still supported), I want to keep the casts. Thats all. If you like, add some @SuppressWarnings("foobar") to it if you would be happy.

More than that, if the method will declare it returns a long in the future, the cast will generate a compile error.

Changing return values of public methods will never happen. And if somebody would do this by accident, the cast helps to find the error :) - thats only a funny addition.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Googling around I see it's a known problem in Eclipse w/ no solution yet (at least I haven't come across one). Uwe - can you proceed w/ the commit like I suggested and then we review the remaining warnings?

this drives me nuts. here's what you can do: copy the entire patch to clipboard, then apply patch from clipboard, no encoding problems.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

copy the entire patch to clipboard

Super Robert ![ That worked ]( That worked )!

Now that I apply the patch, I'm back to 1,400 warnings (900 up). Many of them related to generated code and Snowball, but here are few comments:

Besides these, I'm fine w/ the rest. Still a 2400 warnings reduction, and many of the ones left are either in generated code, or related to deliberate use of deprecated API.

asfimport commented 14 years ago

Robert Muir (@rmuir) (migrated from JIRA)

In my internal project, I also make use of Snowball code directly, and improved it to fit better in the analysis process. I should actually diff my changes w/ yours, perhaps I can use yours, or contribute mine.

I'd be curious to know what you did, if its possible for you to, I'd like to hear what you did (maybe on the list so it wont be lost on this issue?)

my recent mods to snowball itself are on #3270, #3277. I think previously Karl modified it so that it doesnt use reflection (except the Lovins stemmer)

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

I'd be curious to know what you did

Ok, now you've made me compare the two :). I'm happy to see we both did the same thing, only you call your buffer 'current' while I call it 'buf'. Besides that I've included a static final EMPTY_ARGS instead of all the places where 'new Object[0]' is passed. Nothing too fancy.

Another thing is that I wrote an Arabic and Hebrew stemmer, and combined them w/ the Snowball ones by introducing a stemmer class which can be either Snowball or anything else. I'll check if we're allowed to contribute the Hebrew stemmer to Lucene ...

BTW FYI - our legal department forbid us to use the Hungarian stemmer because of licensing/legal issues. Besides the stemmers that were originally provided, the Snowball project accepted additional ones like the Hungarian stemmer. However, for that one we weren't able to get a confirmation from the contributor his University indeed gave him permission to contribute the code. Don't know if it matters to anyone here (I've notified Martin Porter as well), but FYI. Our legal department does not permit us to use it (which is not surprising - they are legal ...). I don't want to derail this issue into Snowball discussion, so if you want to talk about it, I suggest we move it to the list.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

UnicodeUtil

I reverted the whole class as it is very sensitive to binary encoding, so please leave it as it is. Tell me any @SuppressWarnings parameter that makes eclipse happy, I will add it!

TestCharacterUtils

I missed that this test is junit4, in junit3 the casts are necessary. But if you bug me longer with these casts I will give the issue to somebody else :-)

AnalyzingQueyrParser

I simply reverted everything with QueryParser in it, because it is normally generated code. :) As I said before, let us first commit this patch. But i will no longer discuss about casts :)

Thanks for reviewing the patch, its a good help to get code cleaner!

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I will commit the current patch soon and post the remaing diff of Shais original patch to the issue.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Committed revision: 917019

I will attach the open changes as separate patch. Please reopen, if new changes.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here for reference the remaining changes of Shai that I rejected (mostly casts, which in my opinion should stay). Important: The UnicodeUtils for example are not tested in a way, that it is really sure, that missing casts do not change the code at all. In my opinion, this should stay like it is.

The smaller one, LUCENE-2285-remaining.patch, contains the real changes. The bigger one LUCENE-2285-remaining+generated.patch also changes in generated code (just for reference).

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Thanks Uwe for committing this. I think that further discussion is pointless if you feel that I bug you, and you "will no longer discuss about casts" ... Kind of kills any chance of having a serious and 'open' discussion. I can live with the code as it is now ...

If someone else feels otherwise, then I don't mind to continue discuss this.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Shai,

sorry, in my opinion the discussion seems to be stuck, as both of us have a personal opinion about the casts and it seems that both of us always presenting the same arguments, so in my opinion, someone else should jump into the discussion. My point is simple that I want to have the casts for "documentation" purposes and "safety" reasons, so later changes in the code will be obvious and anybody reading the code is able to follow. This only affects casts to (long) and (int) <-> (char).

I presented my arguments: Especially for backwards compatibility, as long as the 2.9 branch is maintained (the argument about casting int -> char in code that may get merged to 2.9 branch): I tested it and broke my 2.9 build: A JDK 5 compiler (even with -target 1.4 and -source 1.4) would use the Character methods taking int params, as the JDK5 has it in their rt.jar. If you compile with a JDK 1.4 compiler, the compiler will automatically cast to char, as no int method is available. I tested this in my 2.9 branch: i removed the casts at some places, the code compiled fine (with 1.5 compiler), but when running the tests with JDK 1.4, MethodNotFound exceptions occurred. When I also compiled with the 1.4 compiler, the resulting jar was fine. So forcing the compiler to use (char) methods or (int) methods especially in those Unicode stuff is important. Maybe you understand my argument now. And also when going from 1.5 to 1.6 (so compiling 1.5 code in 1.6) may break in the same way. So I prefer to tell the compiler the method to use.

And for these reasons, I don't understand why you insist in removing those casts. Eclipse declaring them as warnings is in my opinion wrong and should maybe an info message, as everybody is free to add casts and not rely on automatic casting. The same affects autoboxing, if you dont like autoboxing you should be free to explicitely say how the values should be converted. As Lucenes Collectors are hotspots, automboxing without control may affect performance! So adding explicit boxing and explicit casts is a good idea, although YOU think they are wrong or unneeded.

As I dont use eclipse, I have no idea about the correct parameter; I would suggest to add a @SuppressWarnings("foobar") for supressing those cast messages in the affected classes. Would this be an option? You will get no warnings and I can preserve my casts.

If you have other improvements in non-generated code I would be happy to commit the patches. I also merged your patch yesterday to the flex branch, so Mike and Robert can also use it in the flexible indexing branch. So again thank you for the improvements.