aplbrain / grand

Your favorite Python graph libraries, scalable and interoperable. Graph databases in memory, and familiar graph APIs for cloud databases.
Apache License 2.0
80 stars 6 forks source link

Question: edge attributes #35

Closed MikeB2019x closed 1 year ago

MikeB2019x commented 2 years ago

My situation is that I use networkx and have access to a postgres db. I find networkx to be quite slow and thought of using some of the alternatives esp. networkit. The challenge I have is with attributes ie. networkit seems to allow only a single numerical 'weight' for edge attributes. My graphs need pretty rich edge attributes and networkx accommodates those. So:

  1. would grand allow me to use networkx syntax/features and edge attribute functionality with networkit eg. filter edges on rich attribute set but retain algorithms running at higher speeds?

  2. you mention grand interacting with dynamo db. I'm not sure I understand that. Is grand using the db to store the graph structure and if so, could it do that with a postgres db? Note: I had a look at this and it seems like this is what I had in mind but when I read your readme.

j6k4m8 commented 2 years ago

Hi @MikeB2019x!

It sounds like Grand should be great for your use-case.

Indeed, Grand will handle attributes even if Networkit can't support them natively; Grand will offload network operations to Networkit, and then will add the attributes back in as you ask for them. (In other words, using Grand like NetworkX should solve your problem without you having to think about it.)

There is a bit of overhead associated with my attribute manager, which runs as a layer on top of Networkit; depending on your use-case, this should be pretty unnoticeable, but can be larger for things like edge-attribute queries.

I am just jotting things down from my phone, so I can't run this... But you should be able to do something like this:

import grand
from grand.backends import NetworkitBackend

G = grand.Graph(backend=NetworkitBackend())

# G.nx is not really a networkx graph, but we can treat it like one:
G.nx.add_edge("A", "B", foo="bar", baz="luhrmann")
G.nx.edges(data=True)

# Can still get the secret underlying Networkit backend:
G.backend._nk_graph

In the case of postgres, we will probably need to write a new Backend to support this optimally, but you may be able to get away with the existing SQLBackend:

https://github.com/aplbrain/grand/wiki/Backends#sqlbackend

j6k4m8 commented 2 years ago

By the way, I am also very happy to answer questions about DotMotif as well :)

MikeB2019x commented 2 years ago

Hi Jordan. Thanks for the super-quick response! I've been trolling around your various repos and I couldn't have found them at a better time. Outstanding quality of documentation btw, you use the 'wiki' feature which seems to be underused.

I have many questions but am still reading through docs so just point me in the right direction if the answer exists somewhere:

  1. I'm unclear about the 'backends'. My impression is that I can have only one but if that is the case then if I have say, the sql backend imported, how do I specify the underlying graph tool? Is the default 'networkx'?
  2. if I use the sql backend do I have to pre-construct a db to a particular schema or can I use an existing one eg. node table (id, attr), edge table (src, tgt, attr)

Note that I'm not sure I can contribute much in the way of coding but I am happy to contribute to documentation.

Cheers, Mike

On Fri, Jun 3, 2022 at 11:43 AM Jordan Matelsky @.***> wrote:

By the way, I am also very happy to answer questions about DotMotif as well :)

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grand/issues/35#issuecomment-1146090604, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALV3V7Y6E4B74EUACY74273VNIRZTANCNFSM5XZF2W2Q . You are receiving this because you were mentioned.Message ID: @.***>

--

http://mindbridge.ai/

Michael Bourassa

Data Scientist | MindBridge

+1 613-894-9189 <+1-613-894-9189>

Bourassa

mindbridge.ai

[image: twitter] https://twitter.com/mindbridge_ai

[image: linkedin] https://www.linkedin.com/company/mindbridge-ai/

[image: banner] https://www.mindbridge.ai/news/mindbridge-named-to-forbes-ai-50-list-of-most-promising-ai-companies

MikeB2019x commented 2 years ago

Just reading through the backend code for the sql backend which answers my second question. Interestingly it is similar to what I have been doing.

On Fri, Jun 3, 2022 at 12:07 PM Michael Bourassa < @.***> wrote:

Hi Jordan. Thanks for the super-quick response! I've been trolling around your various repos and I couldn't have found them at a better time. Outstanding quality of documentation btw, you use the 'wiki' feature which seems to be underused.

I have many questions but am still reading through docs so just point me in the right direction if the answer exists somewhere:

  1. I'm unclear about the 'backends'. My impression is that I can have only one but if that is the case then if I have say, the sql backend imported, how do I specify the underlying graph tool? Is the default 'networkx'?
  2. if I use the sql backend do I have to pre-construct a db to a particular schema or can I use an existing one eg. node table (id, attr), edge table (src, tgt, attr)

Note that I'm not sure I can contribute much in the way of coding but I am happy to contribute to documentation.

Cheers, Mike

On Fri, Jun 3, 2022 at 11:43 AM Jordan Matelsky @.***> wrote:

By the way, I am also very happy to answer questions about DotMotif as well :)

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grand/issues/35#issuecomment-1146090604, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALV3V7Y6E4B74EUACY74273VNIRZTANCNFSM5XZF2W2Q . You are receiving this because you were mentioned.Message ID: @.***>

--

http://mindbridge.ai/

Michael Bourassa

Data Scientist | MindBridge

+1 613-894-9189 <+1-613-894-9189>

Bourassa

mindbridge.ai

[image: twitter] https://twitter.com/mindbridge_ai

[image: linkedin] https://www.linkedin.com/company/mindbridge-ai/

[image: banner] https://www.mindbridge.ai/news/mindbridge-named-to-forbes-ai-50-list-of-most-promising-ai-companies

--

http://mindbridge.ai/

Michael Bourassa

Data Scientist | MindBridge

+1 613-894-9189 <+1-613-894-9189>

Bourassa

mindbridge.ai

[image: twitter] https://twitter.com/mindbridge_ai

[image: linkedin] https://www.linkedin.com/company/mindbridge-ai/

[image: banner] https://www.mindbridge.ai/news/mindbridge-named-to-forbes-ai-50-list-of-most-promising-ai-companies

j6k4m8 commented 2 years ago

I'm super super glad to hear that, and thank you for your kind words :)

My impression is that I can have only one [backend]

Yes, that is correct — the data "live" in the backend, so to switch between backends, you need to either move the data between them or have a copy of the data in both.

but if that is the case then if I have say, the sql backend imported, how do I specify the underlying graph tool? Is the default 'networkx'?

The default is NetworkX if you create a graph without specifying a backend:

from grand import Graph

g = Graph()

This is the same as:

from grand import Graph
from grand.backends import NetworkXBackend

g = Graph(backend=NetworkXBackend())

But you can also use a different backend, like this:

import grand
from grand.backends import NetworkitBackend

g = grand.Graph(backend=NetworkitBackend())

In which case the data "live" in Networkit.

Backends are a separate idea from "dialects," which are how you talk to the data. ALL dialects are available on ALL graphs, no matter what their backend is. You can see a full list of dialects here.

For ANY of the graphs detailed above, you can talk to them as though they are NetworkX networkx.Graph objects by using the nx suffix:

g.nx # pretends to be a networkx graph
g.nx.add_edge("node-1", "node-2")

Or you can talk to them as though they were an igraph.Graph object:

g.igraph.vs

Grand handles the "rewriting" of these familiar operators into the language that the backend actually speaks. As far as you (as the user) are concerned, you are actually speaking to NetworkX, not Grand.

if I use the sql backend do I have to pre-construct a db to a particular schema or can I use an existing one eg. node table (id, attr), edge table (src, tgt, attr)

You do not have to pre-construct a database; in fact, you don't even need one to exist. Here's how I would create a SQLite graph with a few edges in it:

import grand
from grand.backends import SQLBackend

g = grand.Graph(backend=SQLBackend(db_url="sqlite:///my-file.db"))

You could also (I haven't done this before! I think it should work, though!) create a postgres connection like this:

g = grand.Graph(backend=SQLBackend(db_url="postgresql://jordan:mypassword@localhost/mydatabase"))

If you already have two database tables in your db (one for nodes and one for edges), you can tell Grand to connect to them like this:

g = grand.Graph(
    backend=SQLBackend(
        db_url="postgresql://jordan:mypassword@localhost/mydatabase",
        node_table_name="my_nodes",
        edge_table_name="my_edges",
        edge_table_source_column="src",
        edge_table_target_column="tgt",
        primary_key="id",
    )
)

This will look for a table called my_nodes and use the column id as the unique key for nodes (all other columns will be considered attributes). It will look for a table called my_edges and assume that the columns called src and tgt are source and target IDs into the nodes table; and all other columns will be treated like attributes.

This starts to get a bit untested; I've done all of these things before, but I am curious to hear your experiences, especially if you wind up using a non-sqlite database!

Contributions to documentation as you discover things would be AMAZING (even just issues saying 'this is under-documented' are helpful!); there are SO many interesting corners in this project that it's hard to tell what docs would be useful and used by people and which would just be extra work for me, without an audience.

Some further reading: #19 talks about connecting to an existing database, with some commentary as well.

MikeB2019x commented 2 years ago

I've been able to connect to the db, use existing tables, create node/edge tables. To speed up scaling some tests I tried to read/write a .graphml file eg. G.nx.read_graphml("my.graphml") resulting in 'NetworkXDialect' object has no attribute 'write_graphml'. Is that functionality foreseen?

j6k4m8 commented 2 years ago

I think read_graphml and write_graphml live at the networkx module level, not on a graph object; I've never tried this, but I think the code for this would be:

import networkx as nx
from grand import Graph

g = Graph(...)

nx.write_graphml(g.nx, "my.graphml")

I would not be surprised if this works! But then again... I would not be surprised if this doesn't work :)

One alternative would be to move the edges and nodes over to a "real" networkx object. For very large graphs this could take a while, but it might suit your purposes here:

import networkx as nx
from grand import Graph

g = Graph(...)
real_g = nx.DiGraph()

for node, attrs in g.nx.nodes(data=True):
    real_g.add_node(node, *attrs)

for u, v, attrs in g.nx.edges(data=True):
    real_g.add_edge(u, v, attrs)

nx.write_graphml(real_g, "my.graphml")
j6k4m8 commented 2 years ago

How goes it, @MikeB2019x? Would it be helpful to hop on a screenshare sometime?

MikeB2019x commented 2 years ago

@j6k4m8 screenshare not required at the moment but I may take you up on that in the future. So trying to write out a graphml as suggested throws an error (stack trace below). If I compare a networkx graph's attributes and those of G.nx, you'll see: [...,'edges', 'get_edge_data','graph','graph_attr_dict_factory','has_edge','has_node'...] for the former compared to [...'edges','get_edge_data','graph_attr_dict_factory','has_edge','has_node',...] for the latter. That is, the 'graph' attribute isn't present in G.nx. I'm guessing that's intentional?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [35], in <module>
      1 graphml_file_name = 'graphtools.graphml'
----> 3 nx.write_graphml(G.nx, graphml_file_name)

File <class 'networkx.utils.decorators.argmap'> compilation 17:5, in argmap_write_graphml_lxml_13(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
      3 from contextlib import contextmanager
      4 from pathlib import Path
----> 5 import warnings
      7 import networkx as nx
      8 from networkx.utils import create_random_state, create_py_random_state

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:171, in write_graphml_lxml(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    160 except ImportError:
    161     return write_graphml_xml(
    162         G,
    163         path,
   (...)
    168         edge_id_from_attribute,
    169     )
--> 171 writer = GraphMLWriterLxml(
    172     path,
    173     graph=G,
    174     encoding=encoding,
    175     prettyprint=prettyprint,
    176     infer_numeric_types=infer_numeric_types,
    177     named_key_ids=named_key_ids,
    178     edge_id_from_attribute=edge_id_from_attribute,
    179 )
    180 writer.dump()

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:729, in GraphMLWriterLxml.__init__(self, path, graph, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    726 self.attribute_types = defaultdict(set)
    728 if graph is not None:
--> 729     self.add_graph_element(graph)

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:740, in GraphMLWriterLxml.add_graph_element(self, G)
    737 else:
    738     default_edge_type = "undirected"
--> 740 graphid = G.graph.pop("id", None)
    741 if graphid is None:
    742     graph_element = self._xml.element("graph", edgedefault=default_edge_type)

AttributeError: 'NetworkXDialect' object has no attribute 'graph'
j6k4m8 commented 2 years ago

Interesting — do you mind if I migrate this to a new issue to address? This would be a good capability for us to have in the Grand library, thank you for bringing it yup!

j6k4m8 commented 2 years ago

@MikeB2019x — what is the status of this issue? Happy to discuss graph export separately in #39 if that's helpful; want to make sure edge attributes are working for you now!