RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.13k stars 554 forks source link

RDF Literal "1"^^xsd:boolean should _not_ coerce to True #847

Closed ashleysommer closed 5 years ago

ashleysommer commented 5 years ago

I encountered this issue while working on pySHACL. Specifically, this bug is causing a failure in one of the tests in the standard data-shapes-test-suite here uniqueLang-002-shapes.ttl. This test relies on the fact that "1"^^xsd:boolean is an invalid Literal, and when testing equality of this Literal against RDF True, it should be not equal.

A simple code recreation:

import rdflib
from rdflib.namespace import XSD
fail_bool = rdflib.Literal("1", datatype=XSD.boolean)
true_bool = rdflib.Literal("true", datatype=XSD.boolean)
print("value: {} , datatype: {} ".format(
    str(fail_bool._value), str(fail_bool.datatype)))
try:
    assert not (fail_bool == true_bool),\
        "\"1\" should not equal \"true\""
except AssertionError as a:
    print("assertion fail: \n{}".format(str(a)))

This is a more complete example: https://gist.github.com/ashleysommer/87f0b9660a71de380889f98745af2f74

I've tracked down the problem to this line in the XSDToPython map: https://github.com/RDFLib/rdflib/blob/5fa18be1231a5e4dfc86ec28f2f754158c6f6f0b/rdflib/term.py#L1484 lambda i: i.lower() in ['1', 'true'] should be changed to lambda i: i.lower() == 'true'

gromgull commented 5 years ago

thanks @ashleysommer ! Could you make a PR and I'll merge?

ashleysommer commented 5 years ago

Ok. I will today.

white-gecko commented 4 years ago

I think this is wrong. As pointed out in #913 the XML Schema Definition Language (XSD) defines the lexical mapping of boolean as:

booleanRep ::= 'true' | 'false' | '1' | '0'

This is also mentioned in https://github.com/w3c/data-shapes/issues/98 . As I understand it https://github.com/w3c/data-shapes/issues/98 speaks about a specific case for a SHACL validation not for the general case of how to deal with xsd:boolean. The RDFlib should stick to the correct standards here.

856 resp. f54759915339f7eaa0688177d92dc9b137f77b2b should be reverted and a fix according to #913 should be introduced.

ashleysommer commented 4 years ago

@white-gecko Ok, I'm going to have to look further into this.

I initially wrote this issue (and PR #856) from the perspective of the python SHACL implementation.

There are spec-driven test files used to implement unit tests for all SHACL implementations. And one of these tests relies on the condition that a backing RDF implementation should treat the given Literal: "1"^^xsd:boolean as an "Invalid Literal", and when comparing equality to a valid literal "true"^^xsd:boolean should return False (not equal).

rdflib doesn't have any concept of an "Invalid Literal", (which might be something to think about as a feature down the track). So the only way to get that SHACL unit test to pass correctly with rdflib's rudimentary Literal handling, was to patch the action of parsing a "1" with datatype=xsd:boolean, to give a python value of False.

Note, I think there's some confusion here about the distinction between a Typed Literal, and a typeless literal.

For example, for typeless literals all of these are valid:

Whereas typed literals are like the following:

@white-gecko That XSD spec you linked is specifically for the XML world, I'm not sure all of the rules in there apply to the RDF world. I think the use of xsd in RDF (even when serialized in rdf+xml format) is a subset of the XML-xsd spec.

This document is specific to RDF: https://www.w3.org/TR/swbp-xsch-datatypes/#boolean but it also seems to agree with you, it seems to agree that "1"^^xsd:boolean is not an invalid literal.

white-gecko commented 4 years ago

Thank you for the pointer to W3C Working Group Note XML Schema Datatypes in RDF and OWL. It states explicitly:

Boolean is a datatype with value space {true,false}, lexical space {"true", "false","1","0"} and lexical-to-value mapping {"true"→true, "false"→false, "1"→true, "0"→false}. "true"^^xsd:boolean is a typed literal, while "true" is a plain literal.

Regarding the parsing and serialization. I think we should distinguish between the RDFlib data model which should be abstract from the serialization formats. We should not mix the interpretation of Turtle abbreviations with the interpretation of lexical values.

More comments in #913.