FasterXML / jackson-core

Core part of Jackson that defines Streaming API as well as basic shared abstractions
Apache License 2.0
2.27k stars 799 forks source link

Ability to determine whether field name interning is enabled for JsonParser #1211

Open chokoswitch opened 9 months ago

chokoswitch commented 9 months ago

I have custom deserialization logic of protocol buffers and I took advantage of field names being interned to improve performance.

https://github.com/curioswitch/protobuf-jackson/blob/main/src/main/java/org/curioswitch/common/protobuf/json/ParseSupport.java#L458

Actually, at the time I didn't know it is possible to disable interning. I was able to create a failing test for it

https://github.com/curioswitch/protobuf-jackson/blob/main/src/test/java/org/curioswitch/common/protobuf/json/JacksonInteropTest.java#L20

It would be great to be able to know if the parser is interning to be able to handle this case correctly while also skipping non-reference equality check when it's fine to do so. Currently JsonParser only exposes parser features:

https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.html#getFeatureMask()

But there is no access to the factory features. Naturally JsonParser is not the right level for that but I wonder would it be possible to expose factory features from ParserBase? Or it could just be the three standard parser implementations individually exposing it. If we could check the factory features, or more specifically whether the symbol finder is interning strings, then it would be possible to skip .equals for name comparisons.

I looked at databind, and while I'm not completely sure I understand the code, I think this optimization could be applied there too

https://github.com/FasterXML/jackson-databind/blob/2.17/src/main/java/com/fasterxml/jackson/databind/deser/impl/BeanPropertyMap.java#L575

The .equals could be skipped when it's known the fields are interned.

cowtowncoder commented 9 months ago

Right, factory features are expected to only be accessible via factory. But I can see why it might be useful to know it from parser instance.

I am open to adding a mechanism but probably not by passing factory features -- rather it'd probably be by exposing symbol table in use since that I think ultimately has that information. And yes, that'd need to be at level where this can be accessed.

chokoswitch commented 9 months ago

Thanks @cowtowncoder so I guess it could be an accessor to the symbol table on the three main parser implementations. I'm a bit unsure what the exact form of the appropriate API would be, if you have a suggestion I could try to send a PR for it.

cowtowncoder commented 9 months ago

@chokoswitch I haven't looked into this but... ok, there doesn't seem to be any really good places to do this since in 2.x there is no common JSON-specific intermediate parser base class (3.0 adds one). So specific symbol table is separately passed and used by all 4 backends (ReaderBasedJsonParser etc). Further, ByteQuadsCanonicalizer and CharsToNameCanonicalizer do not implement common interfaces.

... and of course you originally mentioned this in context of Protobuf, not JSON. So it'd need to be in JsonParser.

So perhaps there'd be need for separate new introspection method (capability) similar to, say

public boolean canParseAsync();

but more like

public boolean doesInternFieldNames();

and that'd be implemented by backends (but also requires default implementation that just returns false, for backwards compatibility.