Open kenwenzel opened 5 months ago
hi @kenwenzel , curious if this is still on your radar and/or what's left to do here? we're hoping to move to 5.x in a few weeks. tx
@nguyenm100 I have it on my radar. While the ID scheme is finished the type-specific conversion logic (integers, doubles etc.) is missing. This needs some time and careful testing. We also need to find a good way to support something like "0312"^^xsd:int
vs. "312"^^xsd:int
. Both literals have the same integer value but are not regarded as equal as their labels are different.
If we embed such a value then we have to make sure that decoding it would always lead to the correct label.
Meaning that "0312"^^xsd:int
can't be embedded into an ID while "312"^^xsd:int
could be embedded.
We do have literal normalisation during RDF parsing BasicParserSettings.NORMALIZE_DATATYPE_VALUES
. We don't have this on the sail level for the MemoryStore or the NativeStore, but I know that other triplestores have this feature.
You could make your embedding feature contingent on normalised data. Maybe configurable at the sail level but defaults to normalisation.
Problem description
The LmdbStore uses 64 bit IDs for values. The scheme is fixed and uses the lower two bits to encode the type of the referenced value:
To support RDF-star #3723 and embedded values #4774 a new scheme that is also extensible for future requirements should be developed.
Preferred solution
The following basic scheme could be used:
Inspired by Jena the following detailled encoding can be used:
bit 0..7:
// following inlined values
see also https://github.com/apache/jena/blob/02ecb71c7033dc09cd929474c9884045dfaa9dc1/jena-tdb2/src/main/java/org/apache/jena/tdb2/store/NodeIdType.java#L87
Are you interested in contributing a solution yourself?
Yes
Alternatives you've considered
No response
Anything else?
No response