RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Issues with negative dates #2829

Open Superraptor opened 1 month ago

Superraptor commented 1 month ago

I'm currently trying to parse an RDF file in TTL exported from a Wikibase using its dumpRdf.php feature.

The Wikibase includes some ISO-8601 dates that are BCE, such as "-0028-08-10T00:00:00Z"^^xsd:dateTime". When processing these, RDFLib spits out the following error:

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#dateTime, Converter=<function parse_datetime at 0x0000020671E09C60>
Traceback (most recent call last):
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\rdflib\term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\isodate\isodatetime.py", line 55, in parse_datetime
    tmpdate = parse_date(datestring)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Username\anaconda3\envs\py311\Lib\site-packages\isodate\isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-0028-08-10'

Is there any recommended way to deal with this? Thanks so much!

nicholascar commented 1 month ago

Oh dear, that's a good find and I've got no solution for you, sorry. Someone will have to look in to the Python date parser. So it's either a limitation in Python general date parsing (unlikely) or an issue with the way RDFLib is using the Python date parsing (more likely). But I don't know that part of the library, sorry.

Superraptor commented 1 month ago

@nicholascar thanks for the response! if anything is a nightmare in any programming language/package it's date/time, looks like we might need someone familiar with the deep magics for this one!

nicholascar commented 1 month ago

I just tried to reproduce the error but can't:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import XSD, PROV

d_neg = Literal("-0028-08-10T00:00:00Z", datatype=XSD.dateTimeStamp)

g = Graph()
g.add((
    URIRef("http://example.com"),
    PROV.startedAtTime,
    d_neg
))
print(d_neg.toPython())
print(d_neg.n3())
print(g.serialize(format="longturtle"))
print(g.serialize(format="json-ld"))

This correctly prints out:

-0028-08-10T00:00:00Z
"-0028-08-10T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTimeStamp>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<http://example.com>
    prov:startedAtTime "-0028-08-10T00:00:00Z"^^xsd:dateTimeStamp ;
.

[
  {
    "@id": "http://example.com",
    "http://www.w3.org/ns/prov#startedAtTime": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTimeStamp",
        "@value": "-0028-08-10T00:00:00Z"
      }
    ]
  }
]

Can you supply code that triggers the error so I can take a look at it in more depth?

Superraptor commented 1 month ago

Sorry for taking so long!

Here's an example portion of the TTL that was causing the issue (sorry Wikibase Turtle is... a bit nested):

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix wd: <http://b73b8432f4ff/entity/> .
@prefix data: <http://b73b8432f4ff/wiki/Special:EntityData/> .
@prefix s: <http://b73b8432f4ff/entity/statement/> .
@prefix ref: <http://b73b8432f4ff/reference/> .
@prefix v: <http://b73b8432f4ff/value/> .
@prefix wdt: <http://b73b8432f4ff/prop/direct/> .
@prefix wdtn: <http://b73b8432f4ff/prop/direct-normalized/> .
@prefix p: <http://b73b8432f4ff/prop/> .
@prefix ps: <http://b73b8432f4ff/prop/statement/> .
@prefix psv: <http://b73b8432f4ff/prop/statement/value/> .
@prefix psn: <http://b73b8432f4ff/prop/statement/value-normalized/> .
@prefix pq: <http://b73b8432f4ff/prop/qualifier/> .
@prefix pqv: <http://b73b8432f4ff/prop/qualifier/value/> .
@prefix pqn: <http://b73b8432f4ff/prop/qualifier/value-normalized/> .
@prefix pr: <http://b73b8432f4ff/prop/reference/> .
@prefix prv: <http://b73b8432f4ff/prop/reference/value/> .
@prefix prn: <http://b73b8432f4ff/prop/reference/value-normalized/> .
@prefix wdno: <http://b73b8432f4ff/prop/novalue/> .

wikibase:Dump a schema:Dataset,
        owl:Ontology ;
    cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
    schema:softwareVersion "1.0.0" ;
    schema:dateModified "2024-08-07T14:23:13Z"^^xsd:dateTime ;
    owl:imports <http://wikiba.se/ontology-1.0.owl> .

data:Q11314 a schema:Dataset ;
    schema:about wd:Q11314 ;
    schema:version "32272"^^xsd:integer ;
    schema:dateModified "2023-11-09T14:26:22Z"^^xsd:dateTime ;
    wikibase:statements "8"^^xsd:integer ;
    wikibase:sitelinks "0"^^xsd:integer ;
    wikibase:identifiers "5"^^xsd:integer .

wd:Q11314 a wikibase:Item ;
    wdt:P82 wd:Q2225 ;
    wdt:P3 "Q1398" ;
    wdt:P123 "8194433" ;
    wdt:P122 "0000000430695667" ;
    wdt:P108 "n79014062" ;
    wdt:P107 "PA6801-PA6961" ;
    wdt:P141 "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
    wdt:P142 "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
    p:P82 s:Q11314-de166e33-42a7-5204-c10f-6a277fdfe081 .

s:Q11314-de166e33-42a7-5204-c10f-6a277fdfe081 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P82 wd:Q2225 ;
    pq:P57 "48" .

wd:Q11314 p:P3 s:Q11314-2fc1a84a-428a-d78a-19bd-27c0d1d4edaf .

s:Q11314-2fc1a84a-428a-d78a-19bd-27c0d1d4edaf a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P3 "Q1398" .

wd:Q11314 p:P123 s:Q11314-0997c22e-4549-b52f-3c9c-9a26571e07a0 .

s:Q11314-0997c22e-4549-b52f-3c9c-9a26571e07a0 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P123 "8194433" .

wd:Q11314 p:P122 s:Q11314-9831b198-4891-efee-42f5-8a77bef1bac0 .

s:Q11314-9831b198-4891-efee-42f5-8a77bef1bac0 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P122 "0000000430695667" .

wd:Q11314 p:P108 s:Q11314-d401b682-4b7a-4bce-a9bb-fe2c6c425539 .

s:Q11314-d401b682-4b7a-4bce-a9bb-fe2c6c425539 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P108 "n79014062" .

wd:Q11314 p:P107 s:Q11314-fa6964d3-4acc-4fe9-718c-025a64bb0aed .

s:Q11314-fa6964d3-4acc-4fe9-718c-025a64bb0aed a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P107 "PA6801-PA6961" .

wd:Q11314 p:P141 s:Q11314-1929b796-4cbb-57fb-c952-4fdb537110a2 .

s:Q11314-1929b796-4cbb-57fb-c952-4fdb537110a2 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P141 "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
    psv:P141 v:9956a3176c50e5f372b2805522b9f235 ;
    prov:wasDerivedFrom ref:07354354b93c0850a770a6e5ac4c2595f1292a8b .

wd:Q11314 p:P142 s:Q11314-92587d20-49ef-1683-95f8-3f8e331166f9 .

s:Q11314-92587d20-49ef-1683-95f8-3f8e331166f9 a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P142 "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
    psv:P142 v:834791bd6aa770755041b4306c4fa39a ;
    prov:wasDerivedFrom ref:07354354b93c0850a770a6e5ac4c2595f1292a8b .

wd:Q11314 rdfs:label "Virgil"@en ;
    skos:prefLabel "Virgil"@en ;
    schema:name "Virgil"@en ;
    skos:altLabel "Virgil (Ancient Roman poet of the Augustan period)"@en,
        "Virgil (Ancient Roman poet, 70-19 BCE)"@en,
        "Virgil, 70 B.C.-19 B.C."@en .

v:9956a3176c50e5f372b2805522b9f235 a wikibase:TimeValue ;
    wikibase:timeValue "-0069-10-13T00:00:00Z"^^xsd:dateTime ;
    wikibase:timePrecision "11"^^xsd:integer ;
    wikibase:timeTimezone "0"^^xsd:integer ;
    wikibase:timeCalendarModel <http://www.wikidata.org/entity/Q1985786> .

v:834791bd6aa770755041b4306c4fa39a a wikibase:TimeValue ;
    wikibase:timeValue "-0018-09-19T00:00:00Z"^^xsd:dateTime ;
    wikibase:timePrecision "11"^^xsd:integer ;
    wikibase:timeTimezone "0"^^xsd:integer ;
    wikibase:timeCalendarModel <http://www.wikidata.org/entity/Q1985786> 

The time value causing the issue is wikibase:timeValue "-0018-09-19T00:00:00Z"^^xsd:dateTime ;.

The code itself is a bit long (it's been a while since I've tested this case so I need to dig through it), but it essentially deconstructed the file into triples and loaded them into the graph. I'll do more checking on this this week and get back to you as soon as I can!

ageorgou commented 1 week ago

Sorry to jump in or state something obvious but I happened to be looking into this a bit. In case it helps:

Negative years (BCE dates) are not supported in either the Python standard library's datetime or in isodate, which rdflib uses. This issue has been reported before (#2210, #2321 at least) but my guess is that the patchy support for BCE dates across Python libraries must make it hard to address.