Closed CodingFabian closed 2 years ago
I'd be happy to merge PR if you have something that can be shown to materially improve performance.
My understanding, for what that is worth, is that actual decoding from 10-based textual representation into binary floating point number is often an order of magnitude or more slower than cost of memory allocation (and resulting) GC.
This may be why JDK does not expose parse method(s) that would accept char[]
as input.
And whereas decoding int
s and long
s manually is simple, floating-point arithmetics is REALLY complicated. If you are interested, there is one really good (even if old) paper that describes algorithm that Java implements -- I was only able to find the opposite side of printing:
https://pdfs.semanticscholar.org/9039/1d5a4b445b27cf8ab68ca10c0d37b69b39fc.pdf
but that gives you an idea of complexity. Or checkout JDK code. It's.... complicated.
I know :-) I didn't suggest it was easy. Just wanted to document that in this case the String creation is significant in terms of garbage. Which matters if you are short on memory. I did look at the implementation of the JDK, which made me wonder why they did not expose a char[] version. They use String.charAt() which incours bounds checks every single time they invoke it.
My point however is that the only case where this would seem to matter is if trying to absolutely minimize any memory allocations. I agree in that JDK could (and ideally should) have provided entry point from char[]
.
Looking into the bottlenecks of parsing GeoJSON with Jackson I've also found TextBuffer.contentsAsDouble()
in the crosshair (it's using 30 to 50% of the overall time, depending on the test). In my case it was not the allocation of strings though, but eventually calling Double.parseDouble
.
There is a library that can parse doubles significantly faster than Java own built-in. It looks a bit experimental so I hoped to just configure/subclass/manipulate Jackson's bits... got close enough, but a final
ruined my parade:
private static JsonFactory factory = new JsonFactory() {
@Override
protected IOContext _createContext(Object srcRef, boolean resourceManaged) {
return new IOContext(this._getBufferRecycler(), srcRef, resourceManaged) {
@Override
public TextBuffer constructTextBuffer() {
return new TextBuffer(this._bufferRecycler) { // bummer... final
@Override
public double contentsAsDouble() throws NumberFormatException {
// eventually call FastDoubleParser here
return super.contentsAsDouble();
}
};
}
};
}
};
Pity, parsing complex GeoJSON polygons requires reading a lot of coordinates, optimizing that step would help quite a bit.
PRs would definitely be welcome.
I am open to changes and could change that method in TextBuffer
but it isn't really a designed extension point and maybe more importantly it'd take a while to get new version in wide enough usage (ideally ought to be in 2.14.0 etc).
There are other ways to go about overriding behavior (via JsonParser
) but they are probably no less work (or better otherwise). Ideally I guess floating-point decoding would be potentially pluggable, and I'd be happy to consider PRs that refactor this aspect. It would have to go in 2.14 as well unfortunately, so more of a longer term solution no matter what (unless actually just embedding relevant code to be called by TextBuffer
which probably is the straightest way to get there).
I think #577 actually covers this; closing.
Currently TextBuffer contentsAsDouble will first create a string that is passed to Double.parseDouble().
I know floating point parsing is painful but if you are looking for optimization opportunities, this is one.