RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.16k stars 555 forks source link

Dataset.graph should not allow adding random graphs to the store #698

Open gromgull opened 7 years ago

gromgull commented 7 years ago

A combination of this:

https://github.com/RDFLib/rdflib/blob/master/rdflib/graph.py#L1356

and this:

https://github.com/RDFLib/rdflib/blob/master/rdflib/graph.py#L1628-L1630

means you can pass an existing graph into the dataset. This will then be added directly. But there is no guarantee this new graph has the same store, and the triples will not be copied over.

This is chaos, but wasn't flagged earlier since it's chaos all the way down :) #167

gromgull commented 7 years ago

I realised due to #679

mgberg commented 4 years ago

To clarify, if I were to run

import rdflib
from rdflib.namespace import FOAF
foaf = rdflib.Graph(identifier = FOAF)
foaf.parse("http://xmlns.com/foaf/spec/index.rdf", format="xml")
ds = rdflib.Dataset()
ds.add_graph(foaf)

am I correct in assuming that expected behavior is that len(foaf) should equal len(ds) (when in fact len(ds) is 0)?

ghost commented 2 years ago

To clarify, if I were to run ... am I correct in assuming that expected behavior is that len(foaf) should equal len(ds) (when in fact len(ds) is 0)?

You're holding it slightly wrong, basically you get your working Graphs from the Dataset rather than adding to it.

import pytest
from rdflib import (
    Dataset,
    FOAF,
    Graph,
)

def test_intuitive_approach_698_comment_577776652():

    foaf = Graph(identifier=FOAF)
    foaf.parse("http://xmlns.com/foaf/spec/index.rdf", format="xml")

    ds = Dataset()
    assert len(list(ds.graphs())) == 1

    ds.add_graph(foaf)
    assert len(list(ds.graphs())) == 2

    dsfoaf = ds.graph(foaf.identifier)
    assert foaf == dsfoaf  # identifier equality

    assert len(dsfoaf) == 0

    assert len(foaf) == 631

def test_the_actualité_698_comment_577776652():

    ds = Dataset()
    assert len(list(ds.graphs())) == 1  # Only the default graph exists at this point

    foaf = ds.add_graph(FOAF)  # Create empty, identified graph in ds, return it [1]
    assert type(foaf) is Graph
    assert len(list(ds.graphs())) == 2  # ds now has two graphs

    # Operations on foaf == operations on the ds graph

    foaf.parse("http://xmlns.com/foaf/spec/index.rdf", format="xml")

    assert len(foaf) == 631

    # ds still has handle on the graph, so we can delete the referent

    del foaf

    with pytest.raises(UnboundLocalError):
        assert foaf is None

    # Get the graph from ds
    foaf = ds.graph(FOAF)

    assert len(foaf) == 631

HTH.

__len__ is a bit awkward in the context of a Dataset, it could be number of triples in all graphs or it could be the number of triples in the default graph depending on the settings. With ConjunctiveGraph, the default_union setting is switchable with g.default_union = True (for example) or, in SPARQL, rdflib_sparql_module.SPARQL_DEFAULT_GRAPH_UNION but if you back it with an instance of fuseki implementing a SPARQLUpdate store, then it's one or the other, settable in the config and can't be changed on the fly.

[1] https://github.com/RDFLib/rdflib/blob/6ed2ef48ed38679bcdafe7cae250a2ef4b315e7b/rdflib/graph.py#L1881