Closed xtonik closed 8 months ago
@xtonik thanks for the changes but this code would need to be sigificantly faster in order to justify the extra complexity - at least for me.
@xtonik Is there some article or reference showing the idea behind technique? I can believe that using multiplication and shifting can be faster but.. how much? Is it worth the change?
Maybe you have links to other projects doing this as optimization? (I assume it's a well-known tehnique)
I think I am open to change, provided there's something published about technique (ideally it's a well-known technique that I wasn't familiar with). Let's drop 80% calculation as it's not worth optimizing.
It'd be good to have jmh
-based micro-benchmark as well; maybe I'll modify
https://github.com/cowtowncoder/misc-micro-benchmarks
with it.
Ok. So a quick test with misc-microbenchmarks
gives these numbers with JDK 17:
Benchmark Mode Cnt Score Error Units
DivBy1000.perfHandOptimized thrpt 5 5.022 ± 0.037 ops/s
DivBy1000.perfSimpleDivision thrpt 5 3.566 ± 0.029 ops/s
that is, +50% throughput for operation itself. With JDK 8 bit less; and for some reason I don't quite know, no speedup with JDK 21 (maybe the "unoptimized" case is actually optimized by JDK under the hood).
Whether this translates into measurable improvement for number serialization is open question; I guess it shouldn't hurt.
Also: I modified unit test a bit by moving String concatenation out of busy loop and test over full int
range takes only about 2 seconds. I'll probably tweak it a bit to have automated, always run part, and separate manually run full test.
@xtonik I think we can make this happen, and although it's hard to know how meaningful for real use improvements are, change is still relatively small and tested (I modified unit test to run through 1/7 of full int
range and leave on for CI).
But one thing before I can merge this is that if we don't have a CLA for you, we'd need one (just needs to be sent once before the first contribution). It's this one:
https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf
and usually easiest to print, fill & sign, scan/photo; then email cla
at fasterxml dot com.
Once I get it I can proceed with last clean up and merge.
Looking forward to getting this merged, thank you in advance!
@xtonik I think we can make this happen, and although it's hard to know how meaningful for real use improvements are, change is still relatively small and tested (I modified unit test to run through 1/7 of full
int
range and leave on for CI).But one thing before I can merge this is that if we don't have a CLA for you, we'd need one (just needs to be sent once before the first contribution). It's this one:
https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf
and usually easiest to print, fill & sign, scan/photo; then email
cla
at fasterxml dot com. Once I get it I can proceed with last clean up and merge.Looking forward to getting this merged, thank you in advance!
Ok, I will subscribe it.
Ok. So a quick test with
misc-microbenchmarks
gives these numbers with JDK 17:Benchmark Mode Cnt Score Error Units DivBy1000.perfHandOptimized thrpt 5 5.022 ± 0.037 ops/s DivBy1000.perfSimpleDivision thrpt 5 3.566 ± 0.029 ops/s
that is, +50% throughput for operation itself. With JDK 8 bit less; and for some reason I don't quite know, no speedup with JDK 21 (maybe the "unoptimized" case is actually optimized by JDK under the hood).
Whether this translates into measurable improvement for number serialization is open question; I guess it shouldn't hurt.
Also: I modified unit test a bit by moving String concatenation out of busy loop and test over full
int
range takes only about 2 seconds. I'll probably tweak it a bit to have automated, always run part, and separate manually run full test.
As you said, the compiler for new Java versions does more optimizations than older ones. The same situation (not remembering from which version) is probably for String.replace(String, String)
, when given string is one character long, then it is replaced by String.replace(char, char)
. This is what is about one of my prepared PR.
I have just sent my CLA to given email as requested.
If someone is interested and have some time, I have prepared another changes. As they are tiny, it takes couple of minutes to check if some of them is valuable to create PRs.
Division by 1000 was replaced with more performant custom method using multiply and add. I suppose that this tiny method will be inlined, otherwise the code must be copied to places where is called.
Another similar simple optimization was also done - multiplication of int by constant
0.80
in integer (exactly inlong
) domain.The validity is verified by attached tests. They are provided as a comparison to original implementation and thus they don't need to be executed again and again, so they are marked as a ignored not to cause running tests slowly - on my PC they take around 10 s.
No behavior changes were done, performance improvement only.