aplbrain / grand

Your favorite Python graph libraries, scalable and interoperable. Graph databases in memory, and familiar graph APIs for cloud databases.
Apache License 2.0
80 stars 6 forks source link

NetworkXDialect does not work correctly with networkx.DiGraph #44

Open khoale88 opened 1 year ago

khoale88 commented 1 year ago

Hi @j6k4m8,

There is an issue with grand.Graph and grand.dialects.NetworkXDialect.

Since NetworkXDialect is inherited from networkx.Graph, there happen to be discrepancies between grand.dialects.NetworkXDialect and networkx.Digraph which is popagated back to grand.Graph. One of them is the networkx.Graph.edges returns EdgeView while networkx.Digraph.edges returns OutEdgeView.

Below is one of the test to replicate the issue

def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")   # <<< this won't work with EdgeView for G
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")   # <<< OutEdgeView returns this for H
        H.add_edge("1", "3")
        self.assertEqual(dict(G.edges), dict(H.edges))
        self.assertEqual(dict(G.edges()), dict(H.edges()))
        self.assertEqual(list(G.edges["1", "2"]), list(H.edges["1", "2"]))

The result is

    def test_nx_edges(self):
        G = Graph(directed=True).nx
        H = nx.DiGraph()
        # H = nx.Graph()
        G.add_edge("1", "2")
        G.add_edge("2", "1")
        G.add_edge("1", "3")
        H.add_edge("1", "2")
        H.add_edge("2", "1")
        H.add_edge("1", "3")
>       self.assertEqual(dict(G.edges), dict(H.edges))
E       AssertionError: {('1', '2'): {}, ('1', '3'): {}} != {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       - {('1', '2'): {}, ('1', '3'): {}}
E       + {('1', '2'): {}, ('1', '3'): {}, ('2', '1'): {}}
E       ?                              ++++++++++++++++
j6k4m8 commented 1 year ago

Ah β€” this is a good find β€” sorry that you bumped into this! I won't have time to address this until at least next week, but I will make this a top priority for the repo! Thank you for reporting it!! (And nice to see you over here on this repo too :) )

khoale88 commented 1 year ago

Well, my projects are big. The performance with sql backend appears to be one not-so-serious problem.

Is it a good idea to have separate classes for Graph and DiGraph just like nx has? I'm not sure about compatibility regarding other dialects though.

FYI, grandiso and grand-cypher are super useful but they also perform unwell if a graph has tens of thousands of nodes and connected edges. Sometimes, it's hundreds of times faster to iterate through the graph than using isomorphism. I think if grandiso can fail a candidate even earlier when it detects node/edge attribute mismatch, it would drastically improve. Maybe starting with proving the interestingness.

j6k4m8 commented 1 year ago

I think splitting Graph/DiGraph makes sense (and KIND OF got started here); every library tends to have split representations between the two.

FYI, grandiso and grand-cypher are super useful but they also perform unwell if a graph has tens of thousands of nodes and connected edges. Sometimes, it's hundreds of times faster to iterate through the graph than using cypher/grandiso. I think if grandiso can terminate even earlier when it detects node/edge attribute mismatch, it would drastically improve.

😬 yikes! sorry to hear that β€” what sorts of mismatches are you seeing that it's not breaking early from? grandiso DOES short-circuit a match when there's an attribute mismatch (code) but it's not perfect. Probably each consumer of grandiso β€” ESPECIALLY grandcypher β€” should reimplement its own is_node_attr_match callable with shortcircuit smarts...

khoale88 commented 1 year ago

well, I'm not so good with graph theory. You are the master :D.

khoale88 commented 1 year ago

I think splitting Graph/DiGraph makes sense (and KIND OF got started here); every library tends to have split representations between the two.

not sure if I can work a bit on it this weekend. But as we do the split, it is a non-backward compatible change.

khoale88 commented 1 year ago

Probably each consumer of grandiso β€” ESPECIALLY grandcypher β€” should reimplement its own is_node_attr_match callable with shortcircuit smarts...

You are right, I improve by having another is_node_structual_match to handle DiGraph. It considers both in_degree and out_degree separately. The result is 55% faster!

j6k4m8 commented 1 year ago

Wow, that's a huge improvement!!!

j6k4m8 commented 4 months ago

btw @khoale88,

not sure if I can work a bit on it this weekend. But as we do the split, it is a non-backward compatible change.

I think we're still okay to make breaking changes like this since we're pre-v1.0 for now β€” happy to chat more about this if you have ideas on how to do this well :)

khoale88 commented 4 months ago

It has been a while... I can recall something by looking at my incomplete works 10 months ago.

BUT, there should be some unsolved challenges preventing me from finishing my work that I can't remember, unfortunately :(.