Open white-gecko opened 3 years ago
I presume always setting the datatype will fix the issue I just encountered with duplicate (implicit and explicit) literal entries gathered from multiple files. Minimalistic example (RDFLib version: 5.0.0):
import sys
from rdflib import Graph
g = Graph().parse(format='ttl', data='<http://a> <http://b> ""^^<http://www.w3.org/2001/XMLSchema#string>, "" .')
g.serialize(format="ttl", destination=sys.stdout.buffer)
Output:
@prefix ns1: <http://> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ns1:a ns1:b "",
""^^xsd:string .
Expected:
@prefix ns1: <http://> .
ns1:a ns1:b "" .
For now, I just normalized all triples (dropped explicit ^^xsd:string
part with sed) in the data before parsing them with RDFLib.
I was just wondering if there should be a configurable parameter for graph serialization in either the implicit or explicit form, now that closing this issue will require touching those parts in code. What do you think?
https://github.com/RDFLib/rdflib/issues/2123#issuecomment-1475448693
One option to solve this is to enforce that
rdflib.terms.Literal
always has a datatype, but then we won't be able to support RDF 1.0 anymore. I'm somewhat okay with this, I think it would be nice to be able to support 1.0 and 1.1 - but I think 1.1. support is more important.
According to RDF 1.1 (https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal)
So the datatype of Literals should always be set.
If a language is specified and a datatype it has to be ensured that the datatype is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
See also #670