apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.73k stars 1.04k forks source link

Code Cleanup: Rewrite StringBuilder.append with concatted strings [LUCENE-8847] #9890

Closed asfimport closed 5 years ago

asfimport commented 5 years ago

EDIT: Again, to clarify, please don't bother yourself with this ticket on company time, on personal time you could be working on something that makes you money or improves the product for your feature personally.

 

This entire ticket is an afterthough. A look back at the code base that most people don't have the time for.


 

Code cleanup as suggested by static analysis tools. Will be done in my spare time.

If someone reviews this, please also do not take up actual time from your work to do that. I do not wish to take away from your working hours.

 

These are simple, trivial things, that were probably overlooked or not even considered(which isn't an accusation or something negative). But also stuff that the Java compiler/JIT won't optimize on its own.

 

That's what static analysis tool are good for: picking stuff like that up.

 

I'm talking about Intellij's static code analysis. Facebook's "Infer" for Java. Google's "errorprone", etc...

These are the kinds of things that, frankly, for the people actually working on real features, are very time consuming, not even part of the feature, and have a very low chance of actually turning up a real performance issue.

So I'm opting to have a look at the results of these tools and implementing the sensible stuff and if something bigger pops up I'll make a separate ticket for those things individually.

 

Creating this ticket so I can name a branch after it.

 

The only questions I have are: since the code base is so large, do I apply each subject to all parts of it? Or only core? How do I split it up?

Do I make multiple PRs with this one ticket? Or do I make multiple tickets and give each their own PR?


Migrated from LUCENE-8847 by Koen De Groote (@KoenDG), resolved Jun 10 2019

asfimport commented 5 years ago

Varun Thacker (@vthacker) (migrated from JIRA)

Hi Kevin,

Splitting them up will always be easier for folks to review and the PRs won't get out of sync very quickly.

You could split it up anyways you feel is best while working on it honestly - per error type that the tool catches or go module by module ( and split core by packages or something )

I think you can just create PRs and they get posted on the mailing list so we'll know about it. Additionally you could always link all the PRs here for reference as well

asfimport commented 5 years ago

Koen De Groote (@KoenDG) (migrated from JIRA)

Hello Varun,

Linking the PRs here is a good idea, I will do that.

The first thing I happened to come across is usage of StringBuilder where at some point regular concatenation is used as part of an append.

 

So, something like:

StringBuilder sb = new StringBuilder();
sb.append("Hello, " + "World!");

 

Which defeats the very purpose of using a StringBuilder. And also it causes the internal construction of extra StringBuilder objects by the JVM. And the JVM is very conservative with that optimization. It does no interpretation, no recognizing that there is already a StringBuilder present and adding it to that one.

It is true that for simple concatentation, StringBuilder isn't required. But when it is already being used, then using append for everything is the best option.

 

PR for that is here: https://github.com/apache/lucene-solr/pull/707

 

It's encompasses 103 files across the entire codebase, which is kinda large.

That being said, it is exclusively about that 1 single change. Tests have run, all succeed.

 

The name's "Koen" by the way :P

asfimport commented 5 years ago

Varun Thacker (@vthacker) (migrated from JIRA)

bq.  The name's "Koen" by the way :P

I'm going to blame myself for typing without my morning coffee :) Sorry about that!

Which defeats the very purpose of using a StringBuilder.

My personal take on this though is that unless they are showing up in a profiler under real load fixing these isn't worth it. However looking at your PR the changes make the code more readable as well . I'll leave comments on the PR any changes are required

asfimport commented 5 years ago

Koen De Groote (@KoenDG) (migrated from JIRA)

No worries.

 

My personal take on this though is that unless they are showing up in a profiler under real load fixing these isn't worth it.

Indeed, real situations with real load are the ones that should be looked at first.

 

I'm fully expecting some of these PR's to receive the response of "this is not a priority, refused".

 

I'll offer them anyway though. Unless it takes people's time. Which reminds me, again, please nobody review this on their working hours, on the time they would actually spend on features for this or another project they could be getting paid for. I added that as clarification again to the original post.

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, I am fine with this patch, except for tests. They should not be "optimized" (although it's no longer a required optimization, see below).

So, something like:

StringBuilder sb = new StringBuilder(); sb.append("Hello, " + "World!");

Which defeats the very purpose of using a StringBuilder. And also it causes the internal construction of extra StringBuilder objects by the JVM. And the JVM is very conservative with that optimization. It does no interpretation, no recognizing that there is already a StringBuilder present and adding it to that one.

It is true that for simple concatentation, StringBuilder isn't required. But when it is already being used, then using append for everything is the best option.

This is not quite true anymore. Lucene + Solr master is on Java 11 already, so the class files are compiled for Java 11 and have target Java 11. This means, simple string concats no longer use a StringBuilder internally. In fact, if you concat a lot of stuff with static components like fragments and numbers, it's now better to use string concats instead of StringBuilders. StringBuilders should only be used for loops or if/then/else constructs.

Starting from Java 9, the "+" operator no longer uses StringBuilder and uses instead the new feature "indyfied string concat" (which is BTW also used by Elasticsearch in its painless scripting engine). What happens is that it collects all static parts of the concat in to a static constant pool entry not even passed to the method call, but part of the invokedynamic signature and the remaining dynamic parts are pushed to stack as is. This allows to highly optimize the string concats, which are only one single method call solely containing the dynamic parts to a prepared MethodHandle that contains all static parts.

So we should also review our string concats with StringBuilder and if they don't use if/then/else and loops, replace them by "+".

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

More info:

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I moved this ticket to Lucene. Issues affecting both projects should better be opened in Lucene, especially as this ticket mainly talks about Lucene's code.

asfimport commented 5 years ago

Koen De Groote (@KoenDG) (migrated from JIRA)

Perhaps I should clarify what I meant by "how to split".

 

Since it is expected that all merged PRs go into the CHANGES.txt file, it seems to me like it cannot be that the same number appears more than 1 time.

Or can it? Can I mentioned SOLR-13533 multiple times?

If not, it looks like I'll have to make multiple tickets like this?

 

EDIT: Oh, I didn't see those last 2 comments while writing this, sorry about that.

Also, wasn't aware of the change since Java9

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

You can call it as often as possible. For this case, the changes entry should only go into Lucene with the new issue number LUCENE-8847. Solr won't mention this change as at the beginning of the changes file it refers to Lucene changes anyways.

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

What do you mean with multiple PRs? Here is only one PR and only one issue number.

asfimport commented 5 years ago

Koen De Groote (@KoenDG) (migrated from JIRA)

There is currently only 1 PR, which is limited to the subject of the stringbuilder vs regular concat, for reasons of clarity.

It was and is my intent to make a few more where it seems realistic and useful to do so, for other subjects.

 

I should then continue to reference this ticket and not make a new one?

 

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

If it's about something else, open a new issue. I already changed the issue subject to clarify what it's doing. The pull requests are just part of an issue, normally they won't be mentioned in the changes list.

JIRA: If it's something about Lucene and Solr, use LUCENE; if it's only about Solr, open in SOLR.

asfimport commented 5 years ago

ASF subversion and git services (migrated from JIRA)

Commit 67104dd615c82de64839de60418110211438f574 in lucene-solr's branch refs/heads/master from Koen De Groote https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=67104dd

LUCENE-8847: Code Cleanup: Rewrite StringBuilder.append with concatted strings (#707)

This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation. This defeats the purpose of using StringBuilder and also introduces an extra alloction. These changes should avoid that.

ant tests have run, succeeded on local machine.

Removing test files from the changes.

Another suggested rework.

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I cherry picked the commit to branch_8x. Will do the usual maintenance checks to ensure nothing breaks. At least it applied cleanly.

asfimport commented 5 years ago

Varun Thacker (@vthacker) (migrated from JIRA)

Thanks Uwe!

asfimport commented 5 years ago

ASF subversion and git services (migrated from JIRA)

Commit 8b6a0d0964e74206a7dd0a7c321618197c53a391 in lucene-solr's branch refs/heads/branch_8x from Koen De Groote https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8b6a0d0

LUCENE-8847: Code Cleanup: Rewrite StringBuilder.append with concatted strings (#707)

This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation. This defeats the purpose of using StringBuilder and also introduces an extra alloction. These changes should avoid that.

ant tests have run, succeeded on local machine.

Removing test files from the changes.

Another suggested rework.

asfimport commented 5 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks @KoenDG

asfimport commented 5 years ago

ASF subversion and git services (migrated from JIRA)

Commit 21b0892d3876b865546604eaf718fe0b1de23446 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=21b0892

LUCENE-8847: Fix typo in CHANGES.

asfimport commented 5 years ago

ASF subversion and git services (migrated from JIRA)

Commit 7c5247c60c4461a4817f4de942a7a18d892e8243 in lucene-solr's branch refs/heads/master from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7c5247c

LUCENE-8847: Fix typo in CHANGES.

asfimport commented 5 years ago

Ignacio Vera (@iverase) (migrated from JIRA)

Closing after the 8.2.0 release