FasterXML / woodstox

The gold standard Stax XML API implementation. Now at Github.
Apache License 2.0
230 stars 83 forks source link

Performance tweaks #220

Open winfriedgerlach opened 6 days ago

winfriedgerlach commented 6 days ago

Woodstox already features impressive performance optimizations, to which I would like to add small ones. I will submit in individual PRs, so we can discuss the changes separately.

I benchmarked with a JMH test using namespace-aware StAX-parsing with very little value extraction, other use-cases may show less significant improvements (but still benefit slightly).

Approaches that did not work (maybe someone else tries with different results?):

cowtowncoder commented 5 days ago

Correct, all use of StringBuffer is accidental/historical left-over: nothing in core parser/generator is thread-safe (minus some of SymbolTable reuse which is explicitly synchronized as necessary) -- they are not meant to be used from multiple threads.

Cool stuff, will go through PRs.

winfriedgerlach commented 3 days ago

More details regarding System.arrayCopy() vs. Arrays.copyOf() or .clone():

Very old comments (Joshua Bloch) say .clone() is the fastest way to clone an array. This is already challenged in some of the answers to that question. Most published benchmarks of "medium age" (e.g. some years ago) find that System.arrayCopy() is fastest.

Things seem to have changed quite a lot in the JDK recently (~beginning of 2023), so different Java versions may perform considerably differently:

Other quite new (2024) benchmarks find no significant difference between Arrays.copy() and System.arrayCopy() (note the absence of the JVM version under test...): https://www.baeldung.com/java-system-arraycopy-arrays-copyof-performance

My own micro (and a little less micro) benchmarks showed System.arrayCopy() ahead most of the times, sometimes on same level as clone()/Arrays.copyOf().

As Jackson currently supports every Java version from Java 8 to 23, I recon it is safer to stick with System.arrayCopy() for now when optimizing for performance, even if there is probably almost no performance difference when using the most current (=2024) JVM versions.

I definitely have to agree that the code is nicer with Arrays.copyOf()though, see https://github.com/FasterXML/woodstox/commit/cfe59fbe0b4dbadf1941406759cd6875ad689b56 .

winfriedgerlach commented 3 days ago

@cowtowncoder in StringVector.addString()/.addStrings() I stumbled over the array size being increased by 200%:

oldSize + (oldSize << 1)

This looked a bit odd to me, as in many other places in the source code you increase array size by 50% only, e.g. DataUtil.growArrayBy50Pct():

len + (len >> 1)

Is this intentional or maybe a typo (<< instead of >>) in StringVector?

cowtowncoder commented 3 days ago

Whoa! That sounds more like a bug indeed: unless some comment explicitly states otherwise, and I don't there is.

So it should be shift-right to get +50% increase.