RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 558 forks source link

Fix Liskov substitution principle violations in sub-classes of Graph #2022

Open aucampia opened 2 years ago

aucampia commented 2 years ago

We have many liskov substitution principle violations in the Graph class hierarchy, ideally this should be fixed in the next major version.

ghost commented 2 years ago

I'd be grateful if you could reference an example, just to ground my model.

aucampia commented 2 years ago

One example:

If someone calls graph.add(triple=...) then it will work if graph is Graph, but not if it is a ConjunctiveGraph as the argument name for add is different in ConjunctiveGraph, and there someone will have to call graph.add(triple_or_quad=...). There are more cases like this, mypy does warn about them if the methods have typing.

Two other examples:

https://github.com/RDFLib/rdflib/blob/65ccae5f3302d868a90fc7177d078b76ce50db40/rdflib/graph.py#L2197-L2212

https://github.com/RDFLib/rdflib/blob/65ccae5f3302d868a90fc7177d078b76ce50db40/rdflib/graph.py#L2214-L2219

aucampia commented 2 years ago

Okay, thanks for the specifics. Am I correct in my understanding that you want Dataset to not inherit from Graph?

That would be one option and possibly the best, possibly we can have some shared base class for Graph and Dataset though, and there are various workarounds we can take, but in general all inheritance should adhere to the Liskov substitution principle.

This is not the most serious issue, but just something we should keep in mind.

ghost commented 2 years ago

in general all inheritance should adhere to the Liskov substitution principle

If it's appropriate for the context, which is probably the case here and if it doesn't negatively impact the API either in terms of performance or usability¹

... possibly we can have some shared base class for Graph and Dataset though

That would be Store, I guess :smile:

¹ Object Oriented Programming is an expensive disaster which must end

aucampia commented 2 years ago

in general all inheritance should adhere to the Liskov substitution principle

If it's appropriate for the context, which is probably the case here and if it doesn't negatively impact the API either in terms of performance or usability¹

It is fairly ingrained in python though, isinstance checks "if the object argument is an instance of the classinfo argument, or of a (direct, indirect, or virtual) subclass thereof". If classes do not adhere to LSP then isinstance is not useful and type checking has to be done with identity, which is also against PEP-8's recommendation which states that "Object type comparisons should always use isinstance() instead of comparing types directly" [ref].

ghost commented 2 years ago

A-ha, you started me thinking (sometimes it happens :smile:) and this profoundly impacts the Dataset re-work in that some of problems I've been wrestling with in the implementation actually originate from the LSP violations you have identified.

As I've observed in the Discussion, Dataset as a sub-class of Graph makes no sense in the domain model, let alone being the source of so many LSP violations and elsewhere, LSP violations are clearly signalled - such as the new Exceptions raised for ReadOnlyGraphAggregate, a subclass of Graph.

ATM, I'm looking at a MixinClass solution, that seems to reflect the design semantics better and, as an approach, seems to be quite amenable to static typing