eyereasoner / eye

Euler Yet another proof Engine
https://eyereasoner.github.io/eye/
MIT License
124 stars 17 forks source link

Incorrect QName Replacement for IRIs #110

Closed nebucaz closed 3 months ago

nebucaz commented 3 months ago

When using EYE for reasoning on our triples, we encountered an issue where EYE generates prefixed names from IRIs that contain characters not allowed in QName syntax. According to RFC 3987, IRIs can be written as either relative or absolute IRIs. Relative and absolute IRIs are enclosed in < and > and may include numeric escape sequences. Notably, the "path" part of an IRI may contain characters like "+" as defined in "sub-delims":

absolute-IRI  = scheme ":" ihier-part [ "?" iquery ]
ihier-part    = "//" iauthority ipath-abempty | ipath-absolute | ipath-rootless | ipath-empty
ipath-abempty = *( "/" isegment )
isegment      = *ipchar
ipchar        = iunreserved | pct-encoded | sub-delims | ":" | "@"
iunreserved   = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" | "$" | "&" | "'" | "(" / ")" | "*" | "+" | "," | ";" | "="

(Excerpt from RFC 3987)

EYE replaces the absolute IRI in the given example with a "prefixed qualified name" as per the W3C RDF 1.1 Turtle recommendation (https://www.w3.org/TR/turtle/#sec-iri). The QName syntax is also mentioned in the Namespaces section of the W3C Team submission for Notation 3 (https://www.w3.org/TeamSubmission/n3/):

However, the N3 document lacks a formal definition of QName. The definition can be found in RDF 1.1 Turtle (https://www.w3.org/TR/turtle/#sec-iri):

Prefixed names are a superset of XML QNames. They differ in that the local part of prefixed names may include: leading digits, e.g. leg:3032571 or isbn13:9780136019701 non leading colons, e.g. og:video:height reserved character escape sequences, e.g. wgs:lat-long

This refers to the use of QNames in XML, originally introduced by XML Namespaces. They were defined for element and attribute names to concisely identify a {URI, local-name} pair (https://www.w3.org/2001/tag/doc/qnameids), and are specified in Namespaces in XML 1.0 (Third Edition) (https://www.w3.org/TR/REC-xml-names/):

QName         ::= PrefixedName | UnprefixedName
PrefixedName  ::= Prefix ':' LocalPart
UnprefixedName ::= LocalPart
Prefix        ::= NCName
LocalPart     ::= NCName
NCName        ::= Name - (Char* ':' Char*)
Name          ::= NameStartChar (NameChar)*
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Steps to reproducs

File: test.n3:

@prefix : <https://example.org/>.

:s :p :o .
<https://example.org/plus+is+not+allowed+in+qnames> :p :o  .

Expected Result

@prefix : <https://example.org/>. 

:s :p :o .
<https://example.org/plus+is+not+allowed+in+qnames> :p :o .

Actual result

$ eye --nope --quiet --pass test.n3

yelds

@prefix : <https://example.org/>. 

:s :p :o .
:plus+is+not+allowed+in+qnames :p :o .

This demonstrates the issue where the IRI https://example.org/plus+is+not+allowed+in+qnames is incorrectly replaced with a QName, despite containing characters not allowed in QName syntax.

Python Test Script

Additionally, a Python script using RDFLib to parse the result confirms the issue:

$ eye --nope --quiet --pass test.n3 > test_result.n3

from rdflib import Graph

graph = Graph()
graph.parse("test_result.n3",format="turtle")
print(graph.serialize(format="turtle"))

yields:

rdflib.plugins.parsers.notation3.BadSyntax: at line 4 of <>:
Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in:
"b'@prefix : https://example.org/.\r\n\r\n:s :p :o.\r\n:plus'^b'+is+not+alowed+in+qnames :p :o.\r\n'"

This error further indicates that the generated QName is invalid due to the presence of characters not allowed in QName syntax.

Conclusion

Request for Clarification

If we are using the EYE incorrectly, we would greatly appreciate any guidance on how to properly handle IRIs containing characters not allowed in QName syntax. Please let us know if there are specific configurations or approaches we should be using to avoid this issue.

Please investigate and address this issue.

josd commented 3 months ago

Thank you very much for the clear explanation and for the example.

EYE is now fixed so that it does not use sub-delims in a QName. The output is now

@prefix : <https://example.org/>.

:s :p :o.
<https://example.org/plus+is+not+allowed+in+qnames> :p :o.
nebucaz commented 3 months ago

Thank you!