Open Mec-iS opened 1 year ago
This is a good question.
One caveat is that the name subgraph
may be conflating two important features for our library:
For the algebraic objects, there are three possible transforms, as shown in https://derwen.ai/s/kcgh#37
Different library integrations and applications will need to mix & match different cases of these.
The term subgraph has a meaning in W3C using labels to denotes subsets of triples: https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-mt/index.html#notation Labeled property graphs have related features, ergo the "label" notation. In either case, this definition has become rather archaic: it's too explicit, often seriously constrained in practice (either SPARQL or Cypher had odd limitations), and not really quite what's needed in a world were ML applications are widespread.
A more contemporary definition – and what's intended here – is that some repeatable process can be used to identify regions of interest within a relatively larger overall graph. It's important to note that a subgraph could be produced by several competing or even conflicting means other than explicit labels or other annotations: declarative (queries), empirical (Graph ML), algorithmic (connected components), probabilistic (PSL), topological, counterfactuals, etc.
Our our SubGraphMatrix
example we use a SPARQL query, then construct the subgraph from the query result set. The results from applying a SHACL rule set or from PSL analysis could be other forms of subgraphs. Motifs from GNNs are another closely related notion.
So one might think of industry use cases for KGs, where there's some very large graph, but then particular data objects which are subgraphs that been constructed from some common set of definitions. These data objects might be repeated many times, for example with Bill of Materials within customer data.
We've had some lively discourse among researchers who are actively pursuing research in this area, and applications that would fit. FWIW, I started out with a reinforcement learning demo for topological categories.
I'd definitely loop in: @maparent @jmueller5 @mbesta @jelisf @paoespinozarias @neobernad
Thinking about subgraphs has certainly evolved much since these library components were named in late 2020, with many thanks to @jmueller5 as the prime force of nature for pragmatic ideas about leveraging subgraphs!
Here's a summary of different possible subgraph construction approaches we've encountered, so far:
explicit
algorithmic
declarative
algebraic
algebird
)empirical
topological
probabilistic
misc
The core idea is we must be able to blend any of the above.
On the one hand I want to be careful not to introduce misnomers (e.g., my conflation of "transform" vs. "subset" operations).
On the other hand, we should not optimize this area to be too specific to a given instance (e.g., SPARQL => matrix).
And (speaking as a person with formal math background who loves functional programming) we should not let the Linear Algebra camp dictate definitions ;) Decades of that got us into the current mess! I would much rather follow the brilliant lessons from projects such as algebird
Thanks for summing up the scope so well, this will keep the discussion on the right footing.
Our our SubGraphMatrix example we use a SPARQL query, then construct the subgraph from the query result set. The results from applying a SHACL rule set or from PSL analysis could be other forms of subgraphs. Motifs from GNNs are another closely related notion.
Ok so it would be better to have pluggable classes that inherit somehow from SubgraphMatrix
to add some polymorphism in the argument taken, like for example:
kglab.subg.SPARQLmatrix
or maybe better kglab.subg.from_SPARQL
kglab.subg.SHACLmatrix
kglab.subg.MLmatrix
And (speaking as a person with formal math background who loves functional programming) we should not let the Linear Algebra camp dictate definitions ;) Decades of that got us into the current mess! I would much rather follow the brilliant lessons from projects such as algebird
I support this approach ~but then we should drop the Matrix
for a generic Subgraph
that would be an alias for KnowledgeGraph
so to have a "recursive" representation.~
My main question was about having a clear entrypoint to all these functionalities via an instatiation of a generic class (Dataframe
in pandas
, like GFrame
, SubGraphFrame
, or anything that keeps the semantic relevance), so to have:
# kglab.<Frame>
class <Frame>:
graph: KnowledgeGraph = ...
subgraph: KnowledgeGraph = ... # or a more relevant alias like `SubGraph`
subgraphmtx: SubgraphMatrix = ...
def __init__(...):
# creates the knowledge graph
self.graph = KnowledgeGraph(...)
...
def graph(self) -> KnowledgeGraph:
return self.graph
def subgraph(self) -> KnowledgeGraph:
return self.subgraph
...
def _get_subg_linear(self, query, ...) -> SubgraphMatrix:
matrix = None
if is_sparql(query):
matrix = SPARQLmatrix(...) # or `subg.from_SPARQL`
setattr(self.subgraph, matrix)
elif is_shacl(query):
...
return matrix
This will allow to keep the current workflow using KnowledgeGraph
but also provide a more consistent experience for users that wants to treat subgraphs without caring too much about the reference graph. A more "operational" entrypoint based on the access patterns of other well-established Python libraries.
Anyway, if subg.py
has relevance as you pointed out (taking from the W3C definition) would be better to rename it to subgraph.py
EDITED
Excellent plan!
Having a Frame
class works well. Connotations of the word "Frame" (more general than "DataFrame" which is a table) fit well here.
And I really agree with what you pointed out about the name "matrix", that does lead to confusion when people don't have exposure to algebraic graph theory.
How about if we used naming conventions similar to NumPy?
Both naming conventions are clear, but vector/matrix/tensor are generally understood concepts within all of computer science, while "ND" without any context may be somewhat confusing. That said, the latter is shorter to write... haha
Hello,
Thanks for sharing - I'm working on a closely related stuff these days (will send a link once it's out).
Best, Maciej
Maciej Besta https://people.inf.ethz.ch/bestam Dept. of Computer Science ETH Zürich Universitätsstrasse 6 Zurich-8092, Switzerland
From: Paco Nathan @.*** Sent: Monday, September 19, 2022 7:29 PM To: DerwenAI/kglab Cc: Besta Maciej; Mention Subject: Re: [DerwenAI/kglab] Renaming SubgraphMatrix (Issue #273)
Excellent plan!
Having a Frame class works well. Connotations of the word "Frame" (more general than "DataFrame" which is a table) fit well here.
And I really agree with what you pointed out about the name "matrix", that does lead to confusion when people don't have exposure to algebraic graph theory.
How about if we used naming conventions similar to NumPy?
— Reply to this email directly, view it on GitHubhttps://github.com/DerwenAI/kglab/issues/273#issuecomment-1251325255, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACBPFSQKSPY5AS2FKAXPVZDV7CPGVANCNFSM6AAAAAAQGVP734. You are receiving this because you were mentioned.Message ID: @.***>
[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/DerwenAI/kglab/issues/273#issuecomment-1251325255", "url": "https://github.com/DerwenAI/kglab/issues/273#issuecomment-1251325255", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
SubGraphMatrix
(and in perspectiveSubGraphTensor
) is the reference class for graph algebra and network analysis. Would it be better to rename thesubg.py
module and classes therein for encompassing a more general approach?For example:
subg.py
-> ?SubGraphMatrix
: keeping the fact that a SPARQL query is needed (so the subgraph naming), it would be better from a data scientist point-of-view to have this class to follow some more popular convention like for exampleGraphFrame
orDataGraph
orNetFrame
(just throw a die with the right naming permutations, I though about this names: graph, frame, net, datagram, data, table, ...)SubGraphTensor
-> as above for future applicationscc: @ceteri @tomaarsen