RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 560 forks source link

`transitive_subjects` and `transitive_objects` return the starting node as first element #2599

Open RinkeHoekstra opened 1 year ago

RinkeHoekstra commented 1 year ago

Both transitive_subjects and transitive_objects return the starting node as first element. This means that they do not behave intuitively, nor is the behavior according to what's described in the docstring.

A bit more detail from the example by @jjon in #1303:

>>> pprint(list(cg.transitive_subjects(RDF.type, pome.Person)))
[rdflib.term.URIRef('http://prosopOnto.medieval.england/2006/04/pome#Person'),
 rdflib.term.URIRef('http://example.com/thisgraph#Hugh_Despenser'),
 rdflib.term.URIRef('http://example.com/thisgraph#Audley_Henry_de'),
 rdflib.term.URIRef('http://example.com/thisgraph#Thomas_earl_of_Warwick_d_1242'),
.
.
. etc.
]

The transitive_subjects method yields pome:Person even though that's not a subject of a triple with rdf:type as predicate and pome:Person as object.

In issue #1303 @white-gecko suggests that you can "just skip the first element when working with the list" but this essentially means that any implementation that uses one of these methods will have to skip the first element.

Suggested fixes:

https://github.com/RDFLib/rdflib/blob/e09ce43f2844d0b0f96ec5b976015901f9268873/rdflib/graph.py#L1141-L1181

ashleysommer commented 4 months ago

Hi @RinkeHoekstra I've looked over the issue in #1303 and I disagree with white-gecko's solution to simply skip the first element. See my new comment on that issue: https://github.com/RDFLib/rdflib/issues/1303#issuecomment-2246724830

I will have a fix for this as part of the upcoming RDFlib 7.1 release. Though if it is seen as a breaking change then we will delay it to the v8.0 release later this year.