Jena already has support for a fixed canonicalization of literals in CanonicalizeLiteral. There is also other code that decides lexical form within the SPARQL expression evaluator.
It would be better called these "normalizations" because there is one fixed choice.
This task is to:
Generalize the mechanism to allow different normalization choices e.g. XSD 1.0 vs XSD 1.1; Turtle syntax short-form for numbers; a defined system choice.
Provide one form where the canonicalization is preserved when written to a TDB2 database. TDB2 holds binary values for some datatypes and rebuilds the lexical form on retrieval. This effective normalizes. It would be beneficial to provide this as a normalization choice for data and also to include it in the test suite to pin it down.
Are you interested in contributing a solution yourself?
Version
5.0.0
Feature
Jena already has support for a fixed canonicalization of literals in
CanonicalizeLiteral
. There is also other code that decides lexical form within the SPARQL expression evaluator.It would be better called these "normalizations" because there is one fixed choice.
This task is to:
Are you interested in contributing a solution yourself?
Yes