apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.08k stars 643 forks source link

Provide a framework for normalizing RDF terms #2557

Closed afs closed 2 days ago

afs commented 5 days ago

Version

5.0.0

Feature

Jena already has support for a fixed canonicalization of literals in CanonicalizeLiteral. There is also other code that decides lexical form within the SPARQL expression evaluator.

It would be better called these "normalizations" because there is one fixed choice.

This task is to:

  1. Generalize the mechanism to allow different normalization choices e.g. XSD 1.0 vs XSD 1.1; Turtle syntax short-form for numbers; a defined system choice.
  2. Provide one form where the canonicalization is preserved when written to a TDB2 database. TDB2 holds binary values for some datatypes and rebuilds the lexical form on retrieval. This effective normalizes. It would be beneficial to provide this as a normalization choice for data and also to include it in the test suite to pin it down.

Are you interested in contributing a solution yourself?

Yes