Open vikramsubramanian opened 4 months ago
Summary: Malformed IRI in Turtle files is not being handled as per the Turtle specification.
Based on the provided information, the issue is related to the handling of malformed IRIs and undefined prefixes in Turtle files. The relevant code snippets are from the third_party/serd/src/n3.c
file, which contains the Turtle file parser logic. To address the issue:
read_verb
function to check if the prefix is defined before attempting to read a PrefixedName. If the prefix is not defined, return an error and skip the triple.read_PrefixedName
function (not provided, but inferred from context), ensure that an error is returned if the prefix part of a CURIE is not defined in the current scope.read_IRIREF
function to handle malformed IRIs correctly by returning an error when an invalid IRI is encountered.read_prefixID
function correctly handles the definition of prefixes and returns an error if a prefix is malformed or redefined incorrectly.read_triples
function to skip triples with errors, such as undefined prefixes or malformed IRIs, while maintaining the ability to continue parsing subsequent triples.The solution should involve:
The code handles the parsing of IRIs and prefixed names, which is directly related to the issue of interpreting undefined prefixes as full IRIs.
This snippet includes the read_verb function, which is responsible for parsing verbs (predicates) in Turtle syntax and may be where the incorrect interpretation of undefined prefixes occurs.
The read_IRIREF function is involved in parsing full IRIs and may contain logic that needs to be adjusted to properly handle malformed IRIs according to the Turtle specification.
According to Turtle specification, you cannot do the following (you can use the W3 [Validata]( tool to see this error):
That's because the prefix "foo" is not defined. We however interpret this as if foo:enemyOf is written as, which indicates that the string inside the angle brackets is a fullIRI and should be interpreted as such. That is, we interpret it as if the file was as follows:
Then if you query the rdfgraph for the triples you get the following:
I think we should comply with the standard and error on this line (and skip the triple if that's what we do).
)