Closed Aklakan closed 1 year ago
The PR looks OK.
Another approach for extension datatypes is to implement NodeValue
.
As to generally using the Node object value, they serve different purposes.
The Node value is for the API which makes it very rigid to ensure Model API compatibility.
IMO The best change would be to not keep a value with the Node at all and do mapping to/from Java in the API code.
I also implemented a NodeValueJson
class in addition to the RDFDatatypeJson
exactly to delay going to Node for as long as possible during SPARQL evaluation. However, in the evaluation Node ande NodeValue are converted back and forth when evaluating expressions and placing the results back into bindings. Not sure how that could be handled efficiently if Node no longer held a value - unless it held the value indirectly via NodeOverNodeValue
(which extends Node).
A different literal label implementation?
There are other cases for carrying information around with Nodes generally -- TDB NodeIds for example.
Change
NodeValue's
_setByValue
method only handles xsd datatypes however it eagely materializes the lexical form even of non-xsd namespace'd datatypes. This introduces a noticeable performance overhead when dealing with datatype extensions such as geometries or json objects which are only used as intermediary values. With my current workload of many small json objects it is around 5-10%.NodeValue itself bears the following comment
The simple solution is to defer materialization of the lexical form after having ensured the given Node has a datatype in the xsd namespace.
As a question, I wonder if it is really necessary for
_setByValue
to always go via the lexical form for all XSD types, or whether as a future improvement it would be possible to reuse the LiteralLabel's Java object.Profile without enhancement:
Profile with enhancement. Note, that
JsonWriter.string()
no longer appears:Are you interested in contributing a pull request for this task?
Yes