FasterXML / jackson-core

Core part of Jackson that defines Streaming API as well as basic shared abstractions
Apache License 2.0
2.26k stars 792 forks source link

Optimize TextBuffer.contentsAsDouble() #346

Closed CodingFabian closed 2 years ago

CodingFabian commented 7 years ago

Currently TextBuffer contentsAsDouble will first create a string that is passed to Double.parseDouble().

I know floating point parsing is painful but if you are looking for optimization opportunities, this is one.

screen shot 2017-01-13 at 00 45 25
cowtowncoder commented 7 years ago

I'd be happy to merge PR if you have something that can be shown to materially improve performance.

My understanding, for what that is worth, is that actual decoding from 10-based textual representation into binary floating point number is often an order of magnitude or more slower than cost of memory allocation (and resulting) GC. This may be why JDK does not expose parse method(s) that would accept char[] as input. And whereas decoding ints and longs manually is simple, floating-point arithmetics is REALLY complicated. If you are interested, there is one really good (even if old) paper that describes algorithm that Java implements -- I was only able to find the opposite side of printing:

https://pdfs.semanticscholar.org/9039/1d5a4b445b27cf8ab68ca10c0d37b69b39fc.pdf

but that gives you an idea of complexity. Or checkout JDK code. It's.... complicated.

CodingFabian commented 7 years ago

I know :-) I didn't suggest it was easy. Just wanted to document that in this case the String creation is significant in terms of garbage. Which matters if you are short on memory. I did look at the implementation of the JDK, which made me wonder why they did not expose a char[] version. They use String.charAt() which incours bounds checks every single time they invoke it.

cowtowncoder commented 7 years ago

My point however is that the only case where this would seem to matter is if trying to absolutely minimize any memory allocations. I agree in that JDK could (and ideally should) have provided entry point from char[].

aaime commented 2 years ago

Looking into the bottlenecks of parsing GeoJSON with Jackson I've also found TextBuffer.contentsAsDouble() in the crosshair (it's using 30 to 50% of the overall time, depending on the test). In my case it was not the allocation of strings though, but eventually calling Double.parseDouble.

There is a library that can parse doubles significantly faster than Java own built-in. It looks a bit experimental so I hoped to just configure/subclass/manipulate Jackson's bits... got close enough, but a final ruined my parade:

    private static JsonFactory factory = new JsonFactory() {
        @Override
        protected IOContext _createContext(Object srcRef, boolean resourceManaged) {
            return new IOContext(this._getBufferRecycler(), srcRef, resourceManaged) {
                @Override
                public TextBuffer constructTextBuffer() {
                    return new TextBuffer(this._bufferRecycler) { // bummer... final
                        @Override
                        public double contentsAsDouble() throws NumberFormatException {
                            // eventually call FastDoubleParser here
                            return super.contentsAsDouble();
                        }
                    };
                }
            };
        }
    };

Pity, parsing complex GeoJSON polygons requires reading a lot of coordinates, optimizing that step would help quite a bit.

cowtowncoder commented 2 years ago

PRs would definitely be welcome.

I am open to changes and could change that method in TextBuffer but it isn't really a designed extension point and maybe more importantly it'd take a while to get new version in wide enough usage (ideally ought to be in 2.14.0 etc).

There are other ways to go about overriding behavior (via JsonParser) but they are probably no less work (or better otherwise). Ideally I guess floating-point decoding would be potentially pluggable, and I'd be happy to consider PRs that refactor this aspect. It would have to go in 2.14 as well unfortunately, so more of a longer term solution no matter what (unless actually just embedding relevant code to be called by TextBuffer which probably is the straightest way to get there).

cowtowncoder commented 2 years ago

I think #577 actually covers this; closing.