Open vikramsubramanian opened 4 months ago
Summary: The Turtle file parser does not recognize the xsd prefix for RDF literals, leading to incorrect parsing of data types.
Based on the provided information, the issue is with the incorrect parsing of xsd:decimal
datatype in RDF literals. The relevant code snippet is from rdf_utils.cpp
where the addRdfLiteral
function is defined. The function checks the datatype of the literal and attempts to cast it to the appropriate C++ type.
To resolve the issue:
XSD
prefix is correctly defined in rdf_keyword.h
and matches the expected URI for XML Schema datatypes (http://www.w3.org/2001/XMLSchema#
).rdf_utils.cpp
, within the addRdfLiteral
function, verify that the type.ends_with(XSD_decimal)
condition correctly identifies the xsd:decimal
datatype. If necessary, adjust the string comparison to match the full URI of the xsd:decimal
datatype.xsd:decimal
and xsd:double
literals to ensure that they are correctly recognized and cast to the appropriate C++ types.Here is a concise solution:
rdf_keyword.h
contains the correct URI for XSD
:
static constexpr std::string_view XSD = "http://www.w3.org/2001/XMLSchema#";
rdf_utils.cpp
, ensure that the addRdfLiteral
function correctly identifies xsd:decimal
:
if (type.ends_with(XSD_decimal)) {
// Handle xsd:decimal
// ...
}
xsd:decimal
and xsd:double
literals.src/processor/operator/persistent/reader/rdf/rdf_utils.cpp
This file contains logic for parsing RDF literals and recognizing data types, which is directly related to the issue.
This file is part of the Serd library used for parsing RDF data, and it contains functions for reading literals and IRIs which may need to be modified to recognize xsd prefixes correctly.
src/include/common/keyword/rdf_keyword.h
This header defines common RDF keywords and may be relevant for ensuring the xsd prefix is recognized correctly.
src/processor/operator/persistent/reader/rdf/rdf_reader.cpp
This file handles RDF reading and may be relevant for how prefixes are handled during the parsing process.
I was doing some debugging to double check that the data types of RDF literals are parsed correctly. I have the following file:
Copying this into an RDFGraph will result in the following set of triples:
So the behavior is this:
xsd: < .
prefix at the top.First we should recognize prefix namespaces in literal datatype tags as well. Second, I think we should recognize xsd in datatype tags even if the prefix xsd is completely missing. I tested this in GraphDB and they do recognize xsd even if its missing (though they do recognize rdf, rdfs, and owl too).
If you prefer: do not recognize xsd without the prefix for now and instead open an issue to have a configuration to support common namespaces including this. If you choose this, make sure you take a note reminding not to forget the xsd prefix and to add tests for it when parsing literal data types. )