RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 555 forks source link

Why does "|=" behave differently on graphs than on sets? #1188

Open urbanmatthias opened 4 years ago

urbanmatthias commented 4 years ago

Hi,

I have a question concerning the |= operator. It seems to me that it behaves differently with rdflib graphs than it does with sets. While |= performs an in-place union when used with sets, rdflib creates a new Graph when used with Graphs. Is this on purpose?

See this minimal example:

Python 3.8.5 (default, Aug  5 2020, 08:36:46) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = set()
>>> b = a
>>> a |= {"elem"}
>>> a is b
True
>>> a, b
({'elem'}, {'elem'})
>>> import rdflib
RDFLib Version: 5.0.0
>>> a = rdflib.Graph()
>>> b = a
>>> c = rdflib.Graph()
>>> c.add((rdflib.URIRef("s"), rdflib.URIRef("p"), rdflib.URIRef("o"))
... )
>>> a |= c
>>> a is b
False
>>> list(a), list(b)
([(rdflib.term.URIRef('s'), rdflib.term.URIRef('p'), rdflib.term.URIRef('o'))], [])
ashleysommer commented 4 years ago

Hi @urbanmatthias Yep, you're right. That union operator does work differently on a Graph than it does on a set, and it does look like its that way on purpose. I don't know what is more "correct" here. I think the idea behind creating a new graph on this operation is to avoid polluting an existing graph. Or the graph a might be read-only, so the most consistent and reliable way of completing the union would be to create a new graph and union into that.

Note, I found in my testing that a += c does do what you'd expect a |= c to do. But I think that is wrong too, because += should add a single triple or a list of triples, where |= should union the graphs as it does for a set.

@nicholascar @white-gecko
Do you guys have any opinion on this? My thoughts for changes in RDFLib v6.0.0 are:

FlorianLudwig commented 4 years ago

I agree with @ashleysommer suggestions to change this in v6.

Some more context from the python stdlib:

The <operator>= like += or |= are called "in place" in python and for mutable objects (like sets) it means that the left-hand object is changed. I don't think the python convention is that "in place" means the left-hand MUST be mutated (so the current implementation is not wrong) but CAN or SHOULD (for performance reasons).

I think the idea behind creating a new graph on this operation is to avoid polluting an existing graph.

As in-place operators do "pollute" objects with standard types I don't think this is a behaviour is expected. If need a = a + c can still be used.

Or the graph a might be read-only, so the most consistent and reliable way of completing the union would be to create a new graph and union into that.

The standard library does create new objects for immutable objects, like tuples:

>>> a = (1, 2)
>>> a + (3, 4)
(1, 2, 3, 4)
>>> a += (3, 4)
>>> a
(1, 2, 3, 4)
white-gecko commented 4 years ago

I think changing this for v6 would be a good idea. I would expect the in place operators to actually work in place. Actually I do not understand, what could be the difference between += and |= on graphs. I would expect both to behave in the same way, also if left and right are graphs or left is a graph and right is a triple. Is there a difference for sets between += and |=?

FlorianLudwig commented 4 years ago

@white-gecko sets do not support +=

jbmchuck commented 4 years ago

Updating |= to perform an in-place union would be nice. I believe it's doing an update rather than a union if we are going by set's semantics.

I'd like if rdflib could keep current |= behavior but as | and/or Graph.union - this would mirror the behavior of set and would give a migration path for code relying on |='s current behavior.