Letractively / rdflib

Automatically exported from code.google.com/p/rdflib
Other
0 stars 0 forks source link

slice implementation for graphs #202

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Python supports quite powerful "slice" mechanisms for collection-like things. 
Inspired by the use in scipy, I've tried to come up with a way to slice graphs. 

The branch "slice" (http://code.google.com/p/rdflib/source/browse/?name=slice)
includes an implementation and a test. 

The discussion below is also in the test file.

Slicing in python supports:
Slicing a range, i.e element 2-5, with a step 
slicing in more than one dimension with comma

normal lists only let you do ranges or single items

scipy lets you slice multidimensional arrays like this: 
{{{
array[(2,5),10:20]
}}}  
returns the 10-20th column of the 2nd and 5th row 
in python slice syntax
You can combine tuples and ranges, but not vice versa, i.e. 
i.e 
{{{
a[(0,1):2]
}}}
 is ok, although what is means is not defined for scipy 

{{{
a[(0:1),2]
}}} 
is NOT ok. 

In theory, a graph could be seen as a 3-dimensional array of booleans,
i.e. one dimension for subject, predicate, object, and bools whether 
this triple is contained in the graph. 

So we could use slice dimensions for each triple element, however, this
leaves us with range-slices unused, since there is no concept or order
for rdflib nodes (or there is lexical order, but it's not very useful)

Better is perhaps to pervert the slice object, 
and use start, stop, step as subject, predicate, object

This leaves us with several dimensions, i.e. several objects
And also with tuples used for start, stop, step...

Functions that would be interesting would be:
 * disjunction - matching either of the patterns given
 * conjunction - matching all of the patterns given
 * property-paths - going further in the graph

Gut feeling tells me that conjunction is least useful, 
i.e. neither of these strike me as very useful:
[(bob,bill):likes] - everything bob AND bill likes
[bob:(likes,hates)] - everything bob likes AND hates
[::(pizza,cheese)] - everything about pizza AND cheese

but the disjunction case does seem useful:
[resource:(SKOS.prefLabel,RDFS.label)] -  
    give me either of the two label properties
[:RDF.type:(RDFS.Class,OWL.Class)] - 
    give me all RDFS classes or OWL classes

I think having paths would very nice - i.e.:
g[resource:RDF.type,:RDFS.label] -> get me all labels of the types of this thing

I have implemented disjunction and paths.

One problem with using slices and :: notation for the s,p,o part is that 
this does not generalize to ConjunctiveGraphs, as slices can only have 3 parts. 
However, maybe one does not want to mix and match contexts very often, so 
having 
a simple __getitem__ which is the same as get_graph on ConjunctiveGraphs is 
probably enough: 
{{{
cg[mycontext][:RDF.type,:RDFS.label] 
}}}
This is not implemented atm.

Below are some examples - that should make it much clearer

all operations return generators over full triples - 
although one could try to be clever, and match subject_predicates 
and related functions, and only return tuples - depending on what was given
I think this would be too confusing.

Original issue reported on code.google.com by gromgull on 15 Jan 2012 at 2:42

GoogleCodeExporter commented 8 years ago
(And thanks google code for not letting me edit the issue description.)

I obviously like what I've suggested, I could think of a handful of reasons 
against this though:
 * It doesn't yield the most readable code... and the unclear operator precedence of : and , does not help, i.e. g[me,:RDF.type,RDFS.label] vs. g[me,:(RDF.type,RDFS.label)] (one means, give me the type AND label of all things the resource "me" is linked to, the other says give me the type, then the label of the type) 
 * Python in general is moving AWAY from special syntax, like % for string formatting disappearing in py3. 
 * The pythonic commandment, there should be one way and one way only to do it - this is another way of doing much the same as triples, subject_predicates, etc. 

Original comment by gromgull on 15 Jan 2012 at 2:47

GoogleCodeExporter commented 8 years ago
I suggest that, if we do this, it's limited to quite a simple interface, and 
people are forced to explicitly call methods for more complex tasks. I would 
avoid trying to make some sort of mini-language from slices, because it would 
be very hard to read.

At a glance, without having tried to use it, the idea of interpreting a single 
slice object as subj:pred:obj is quite attractive. I would limit it to 
accepting a single slice, and allow tuples meaning only 'discjunction' (I'd 
describe it as 'union', following the terminology for sets).

Pythonic-ness: It's a minor abuse of slice notation, but we're still retrieving 
a subset of items from a collection.

Side note: % for string formatting is still there in py3. It was officially 
'deprecated', but I don't think anyone's too keen to remove it.

One-way-to-do-it: I suggest the docs describe it as a convenience feature, for 
interactive work or quick scripts, and encourage people to use explicit method 
calls in software that is to be maintained.

Original comment by tak...@gmail.com on 15 Jan 2012 at 3:29