RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.14k stars 555 forks source link

Incorrect serialization of Infinite in XML #1420

Open JeremiasThun opened 2 years ago

JeremiasThun commented 2 years ago

I want to parse the value Infinite for an XSD.float. I work with Python 2.7. Here are my unsuccessful attempts that I believe should have worked:

from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import XSD

NS = Namespace("http://example.com/")

g = Graph()

attempt1 = Literal(float('inf'), datatype=XSD.float) # using a Python representation of an infinite float
g.add((NS.a, NS.b, attempt1))

attempt2 = Literal("Infinite", datatype=XSD.float) # trying a string
g.add((NS.c, NS.d, attempt2))

attempt3 = Literal("INF", datatype=XSD.float) # directly inserting the desired output
g.add((NS.e, NS.f, attempt3))

print g.serialize()

It returns:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:ns1="http://example.com/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="http://example.com/a">
    <ns1:b rdf:datatype="http://www.w3.org/2001/XMLSchema#float">inf</ns1:b>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.com/e">
    <ns1:f rdf:datatype="http://www.w3.org/2001/XMLSchema#float">inf</ns1:f>
  </rdf:Description>
  <rdf:Description rdf:about="http://example.com/c">
    <ns1:d rdf:datatype="http://www.w3.org/2001/XMLSchema#float">Infinite</ns1:d>
  </rdf:Description>
</rdf:RDF>

rdflib seems to change every infinite value to inf, whereas XSD specifies that every infinite value should be written INF: http://books.xmlschemata.org/relaxng/ch19-77095.html

nicholascar commented 2 years ago

Infinity seems to be correctly handled in RDFlib 6.0.0+ with the following:

from rdflib import Graph, Literal, Namespace
from rdflib.namespace import XSD

NS = Namespace("http://example.com/")
g = Graph()

attempt1 = Literal(float('inf'), datatype=XSD.float)  # using a Python representation of an infinite float
g.add((NS.a, NS.b, attempt1))

attempt2 = Literal("Infinite", datatype=XSD.float)  # trying a string
g.add((NS.c, NS.d, attempt2))

attempt3 = Literal("INF", datatype=XSD.float)  # directly inserting the desired output
g.add((NS.e, NS.f, attempt3))

print(g.serialize())

This gives:

@prefix ns1: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ns1:a ns1:b "INF"^^xsd:float .

ns1:c ns1:d "Infinite"^^xsd:float .

ns1:e ns1:f "INF"^^xsd:float .

/.../venv/lib/python3.9/site-packages/rdflib/term.py:1318: UserWarning: Serializing weird numerical rdflib.term.Literal('Infinite', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float'))
  warnings.warn("Serializing weird numerical %r" % self)

So that seems correct: it handles float("inf") and makes output "INF"^^xsd:float. It does the same for "INF" but it throws a Warning for "Infinite".

Is this the functionality you are expecting?

I presume that since you're using Python 2.7 you are using rdflib <= 5.0.0?

JeremiasThun commented 2 years ago

Thanks! Yes, that was the functionality I was expecting and yes, I am using rdflib 5.0.0. Is there any workaround for Python 2.7 other than writing a script to edit the serialization output directly?

nicholascar commented 2 years ago

I can confirm that serialization to RDF/XML and to Turtle is the same in both RDFlib 5.0.0 (Python 2.7) and 6.0.1 (Python 3.6+).

So the issue is within the XML serializer.

It does look like a bug in that the Turtle serializer doesn't seem to have the issue and never has had it.

I've not seen the issue raised before and so I presume noone else has either recognized it or is willing to fix it. Someone might, of course, see this post and tackle the issue!

So your options are to wait for a person to volunteer to fix it, delve into the RDF/XML serializer and fix the issue yourself, thus benefiting all versions of RDFlib or, as you say, post processing the XML in some non-RDFlib way.

JeremiasThun commented 2 years ago

You're right, the Turtle parser handles infinites correctly, even in my version. I think I can just switch to Turtle and everything is fine :) thanks!

nicholascar commented 2 years ago

Yeah, use that Turtle! I'm sad that the RDF/XML is incorrect but Turtle is my RDF serialization of choice.

nicholascar commented 2 years ago

We should leave this ticket open as it still marks the error in the XML serializer