RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

fails to serialize graphs containing some IRIs to xml #2828

Closed trapperkeeper closed 2 months ago

trapperkeeper commented 2 months ago

The issue appears to have a problem with IRIs with the format:

scheme://domain/path/id

when the 'id' portion is numerical (or it might just fail if it starts with a number?).

For example, https://endlessforms.info/gdo/0000240

When serializing a graph with IRIs like the above, I get these errors:

ValueError: Can't split 'https://endlessforms.info/gdo/0000240' ValueError: This graph cannot be serialized to a strict format because there is no valid way to shorten https://endlessforms.info/gdo/0000240

If I change my output format to Turtle, it works. But there are other parts of my pipeline that are expecting xml and I'd rather not be constantly switching between formats.

trapperkeeper commented 2 months ago

I validated the offending IRIs in a URI validator, it said that they are valid (just making sure).

trapperkeeper commented 2 months ago

I think I see the issue. While those are valid IRIs, the property IRIs cannot be shortened to the form gdo:0000240 in the XML. The "local part" of a prefixed xml qname must start with a letter. So, it is the XML specification that is causing these to fail.