aplbrain / grandiso-networkx

Performant, pure-Python subgraph isomorphism and monomorphism search (aka "motif search")
Apache License 2.0
54 stars 10 forks source link

Does this library work with MultiDiGraph subgraph isomorphism search? #32

Open enjoysmath opened 2 years ago

enjoysmath commented 2 years ago

Hi,

I'm wondering if in networkx or this library I'm limited to DiGraph subgraph isomorphism search?

I need multi.

Thanks.

enjoysmath commented 2 years ago

I see networkx has a MultiDigraphMatcher, it's just not in their documentation. Does this library do that? You should be explicit about such limitations. :)

j6k4m8 commented 2 years ago

Hi @enjoysmath! It's on my roadmap (and I'm not actually quite sure what would happen if you tried it now) — I haven't explicitly added support for multigraphs though.

In the meantime, something like DotMotif does support multigraph-host search, directed and undirected, because it handles multiedges in a postprocessing step. What is your current use-case?

[EDIT] Looks like we currently throw an error on multigraphs, but it seems like not a terribly difficult thing to fix. I will bump this up on my priority if this is a useful feature to you!

enjoysmath commented 2 years ago

My use case application screenshot is attached.

Basically these things in math called "commutative diagrams" typically range from between 1 to 50 nodes and maybe 100 edges max. So that will be the subgraph isomorphism query.

The database graph is a bunch of these CD's compiled into a single mother graph of disconnected graphs. So the user can search for an applicable logical diagram rule based upon a graph query. Then hit apply and magic happens.

Anyway, I don't understand how your library can be so small (in code size) yet outperform Networkx.... Please explain.

On Tue, Sep 20, 2022 at 11:36 AM Jordan Matelsky @.***> wrote:

Hi @enjoysmath https://github.com/enjoysmath! It's on my roadmap (and I'm not actually quite sure what would happen if you tried it now) — I haven't explicitly added support for multigraphs though.

In the meantime, something like DotMotif https://github.com/aplbrain/dotmotif/ does support multigraph-host search, directed and undirected, because it handles multiedges in a postprocessing step. What is your current use-case?

[EDIT] Looks like we currently throw an error on multigraphs, but it seems like not a terribly difficult thing to fix. I will bump this up on my priority if this is a useful feature to you!

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252759973, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF54ABG3QJAKPDY2VARDV7H7ZTANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

j6k4m8 commented 2 years ago

That sounds very cool — doesn't look like the attachment uploaded though!

I think that the subgraph isomorphism work in networkx is also relatively small in terms of LOC; the majority of our performance benefit comes from (I think) the data structure that we use to queue candidate motifs, as well as the lower overhead of NOT using classes in our implementation. Our implementation is also a queue-based data-structure instead of a tree-based recursion, and we exit early if attributes don't match... Other than that, I'm not exactly sure what the differences are besides the runtime :)

j6k4m8 commented 2 years ago

As a quick aside on software architecture — it might be worthwhile making MANY small CD queries, rather than one query on a giant database graph; at least in the VF2 and grandiso implementations, subgraph search is exponential in terms of number of edges, so lots of small graphs are faster than one big one, if they share any vertices at all.

And then you can parallelize those individual searches without too much pain. Just a thought! (Happy to discuss this more if you like!) It might also resolve the need to support multigraphs? Though I'm well out of my depth on that part :)

enjoysmath commented 2 years ago

Jordan,

I know that, but because of disconnectivity, the run times should come out about the same. If I were just going to iterate through all my CD's then that defeats the purpose of using networkx or your library.

This is fine for a first release. I have it working btw. Even diagrams visually listed in a search results display.

Do CD's interest you ;) ?

Basically, they become extremely important mnemonics for working with higher-level abstract nonsense math. I find them to be extremely beautiful and diagram chasing proofs are just the most shocking thing I've ever seen.

This tool will allow you to do some math with CD's, but if you want LaTeX (TikzCD code) there will later be an export to Quiver CD editor: q.uiver.app which does no math, but is the standard now for best CD editor.

I twice tried using Neo4j to store a database of CD's and a Django frontend (have videos of it searching and working on YT), but the database response time is just waayyy to long, so I figured I should do things locally on the user's machine. And data gets shared through downloadable project files.

EnjoysMath

On Tue, Sep 20, 2022 at 12:42 PM Jordan Matelsky @.***> wrote:

As a quick aside on software architecture — it might be worthwhile making MANY small CD queries, rather than one query on a giant database graph; at least in the VF2 and grandiso implementations, subgraph search is exponential in terms of number of edges, so lots of small graphs are faster than one big one, if they share any vertices at all.

And then you can parallelize those individual searches without too much pain. Just a thought! (Happy to discuss this more if you like!) It might also resolve the need to support multigraphs? Though I'm well out of my depth on that part :)

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252824202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56FXHHBXGNWMOFHHK3V7IHRHANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

enjoysmath commented 2 years ago

Not to mention I find webdev about 10x harder than desktop app dev. I guess I'm stuck in my sandbox, but at least my app will be fast (if I were to write it in C++, lol).

On Tue, Sep 20, 2022 at 12:50 PM Luna Tuna @.***> wrote:

Jordan,

I know that, but because of disconnectivity, the run times should come out about the same. If I were just going to iterate through all my CD's then that defeats the purpose of using networkx or your library.

This is fine for a first release. I have it working btw. Even diagrams visually listed in a search results display.

Do CD's interest you ;) ?

Basically, they become extremely important mnemonics for working with higher-level abstract nonsense math. I find them to be extremely beautiful and diagram chasing proofs are just the most shocking thing I've ever seen.

This tool will allow you to do some math with CD's, but if you want LaTeX (TikzCD code) there will later be an export to Quiver CD editor: q.uiver.app which does no math, but is the standard now for best CD editor.

I twice tried using Neo4j to store a database of CD's and a Django frontend (have videos of it searching and working on YT), but the database response time is just waayyy to long, so I figured I should do things locally on the user's machine. And data gets shared through downloadable project files.

EnjoysMath

On Tue, Sep 20, 2022 at 12:42 PM Jordan Matelsky @.***> wrote:

As a quick aside on software architecture — it might be worthwhile making MANY small CD queries, rather than one query on a giant database graph; at least in the VF2 and grandiso implementations, subgraph search is exponential in terms of number of edges, so lots of small graphs are faster than one big one, if they share any vertices at all.

And then you can parallelize those individual searches without too much pain. Just a thought! (Happy to discuss this more if you like!) It might also resolve the need to support multigraphs? Though I'm well out of my depth on that part :)

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252824202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56FXHHBXGNWMOFHHK3V7IHRHANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

enjoysmath commented 2 years ago

That is interesting, I'll stick with networkx until your library works with MultiDiGraphs. Ping me when it does.

Here is an infinite supply of screenshots, my friend :)

https://www.youtube.com/channel/UCkwSzrP8pr2uN4Mu04ktipg

Ignore the Django/Neo4j videos. I still have that code up, but I'm focusing on this desktop app.

j6k4m8 commented 2 years ago

I'm definitely interested, and I'd love to figure out a way to make this code useful for you. (PS: Is a Rust implementation interesting, or just C?)

I know that, but because of disconnectivity, the run times should come out about the same.

Gotcha — in that case, probably even a little better to put them in the same graph to avoid the overhead of the queries themselves!

Neo4j is definitely super heavy per-query, you're right that it's a non-starter for you here. Bummer!!

Do you have any resources where I could read up on CDs? I didn't see any multiedges in the videos that I took a look at so far (except for possibly here), but maybe I'm not looking in the right place because I don't quite understand the underlying math..?

Or are they purely a property of the larger query graph? i.e., do you need multiedge support in both host and motif?

enjoysmath commented 2 years ago

In order to get around the restriction of only having node to node arrows. When I convert my graph to networkx, I convert each arrow to its own node, which holds the arrow's attributes, and the only edges in the graph are the "input" edge and the "output" edge coming from this "arrow node". These edges are blank (no attributes). See how that works?

So if in supporting the above features you have to use some lower level graph library that only supports node-to-node arrows, then you can do like the above as a wrapper layer.

On Tue, Sep 20, 2022 at 1:19 PM Luna Tuna @.***> wrote:

Google equalizer diagram in category theory.

https://www.google.com/search?q=equalizer+diagram+category+theory&rlz=1C1RXQR_enUS1001US1001&hl=en&tbm=isch&sxsrf=ALiCzsYpxNxu_PeS4cWU6va_hTdbuJJxjQ:1663704586336&source=lnms&sa=X&ved=2ahUKEwi5v4THlqT6AhVvC0QIHcbQC6kQ_AUoAXoECAEQAw&biw=2133&bih=1041&dpr=0.9

Basically, if you have two parallel arrows between two nodes and they're in the same direction and the "diagram commutes", then they're actually equal arrows. Though I am dealing with CD's mainly, some diagrams are not commutative meaning between some two nodes A, B there exists separate paths of arrows P, Q such that they don't compose to the same arrow (are not equal).

Also in order to support general graph editing I will need MultiDiGraph support. It would be lame if I made this CD editor, but it didn't support drawing basic graphs too! Since CD's are essentially just DAGs. Though later, I will allow any Multi-graphs using a another definition of commutative diagram, which will involve DFA's / regexes. But usually a CD is defined as a functor D from a poset P into another category C, or D : P -> C. Because P is a poset category, there is one and only one arrow between any two objects and so a CD (you're right) is usually defined to also have a unique edge.

But I'm allowing non-CD's as well.

Also a graph library that supports both node to arrow, arrow to node, and arrow to arrow connections as well as the usual node to node connections, would be ideal for me because in math sometimes the arrows in a diagram are between arrows! Look up composition of natural morphisms.

Anyway, if you wanted to learn more Category Theory / Homological Algebra / Arrow Math / Commutative Diagram Chasing Proofs. I could recommend some books. Hopefully though my tool will be something you use in the future to help you learn these concepts. You could even gamify theorems with it eventually, but that's not high priority right now.

EnjoysMath

On Tue, Sep 20, 2022 at 1:04 PM Jordan Matelsky @.***> wrote:

I'm definitely interested, and I'd love to figure out a way to make this code useful for you. (PS: Is a Rust implementation interesting, or just C?)

I know that, but because of disconnectivity, the run times should come out about the same.

Gotcha — in that case, probably even a little better to put them in the same graph to avoid the overhead of the queries themselves!

Neo4j is definitely super heavy per-query, you're right that it's a non-starter for you here. Bummer!!

Do you have any resources where I could read up on CDs? I didn't see any multiedges in the videos that I took a look at so far (except for possibly here https://youtu.be/TSTdyRDvSss?t=139), but maybe I'm not looking in the right place because I don't quite understand the underlying math..?

Or are they purely a property of the larger query graph? i.e., do you need multiedge support in both host and motif?

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252847400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56RDPHD3FYV4DVI5B3V7IKFRANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

enjoysmath commented 2 years ago

At one point I was supporting even nested nodes and it would mean something mathematically, but 1) we never see that in math literature, and 2) it's extremely tricky to debug the graphics editor code for that. This way is much better! However, any future library you make should support nesting of graphs. Again you can do that by designating one of the above blank edges with a keyword "parent pointer" attribute.

On Tue, Sep 20, 2022 at 1:21 PM Luna Tuna @.***> wrote:

In order to get around the restriction of only having node to node arrows. When I convert my graph to networkx, I convert each arrow to its own node, which holds the arrow's attributes, and the only edges in the graph are the "input" edge and the "output" edge coming from this "arrow node". These edges are blank (no attributes). See how that works?

So if in supporting the above features you have to use some lower level graph library that only supports node-to-node arrows, then you can do like the above as a wrapper layer.

On Tue, Sep 20, 2022 at 1:19 PM Luna Tuna @.***> wrote:

Google equalizer diagram in category theory.

https://www.google.com/search?q=equalizer+diagram+category+theory&rlz=1C1RXQR_enUS1001US1001&hl=en&tbm=isch&sxsrf=ALiCzsYpxNxu_PeS4cWU6va_hTdbuJJxjQ:1663704586336&source=lnms&sa=X&ved=2ahUKEwi5v4THlqT6AhVvC0QIHcbQC6kQ_AUoAXoECAEQAw&biw=2133&bih=1041&dpr=0.9

Basically, if you have two parallel arrows between two nodes and they're in the same direction and the "diagram commutes", then they're actually equal arrows. Though I am dealing with CD's mainly, some diagrams are not commutative meaning between some two nodes A, B there exists separate paths of arrows P, Q such that they don't compose to the same arrow (are not equal).

Also in order to support general graph editing I will need MultiDiGraph support. It would be lame if I made this CD editor, but it didn't support drawing basic graphs too! Since CD's are essentially just DAGs. Though later, I will allow any Multi-graphs using a another definition of commutative diagram, which will involve DFA's / regexes. But usually a CD is defined as a functor D from a poset P into another category C, or D : P -> C. Because P is a poset category, there is one and only one arrow between any two objects and so a CD (you're right) is usually defined to also have a unique edge.

But I'm allowing non-CD's as well.

Also a graph library that supports both node to arrow, arrow to node, and arrow to arrow connections as well as the usual node to node connections, would be ideal for me because in math sometimes the arrows in a diagram are between arrows! Look up composition of natural morphisms.

Anyway, if you wanted to learn more Category Theory / Homological Algebra / Arrow Math / Commutative Diagram Chasing Proofs. I could recommend some books. Hopefully though my tool will be something you use in the future to help you learn these concepts. You could even gamify theorems with it eventually, but that's not high priority right now.

EnjoysMath

On Tue, Sep 20, 2022 at 1:04 PM Jordan Matelsky @.***> wrote:

I'm definitely interested, and I'd love to figure out a way to make this code useful for you. (PS: Is a Rust implementation interesting, or just C?)

I know that, but because of disconnectivity, the run times should come out about the same.

Gotcha — in that case, probably even a little better to put them in the same graph to avoid the overhead of the queries themselves!

Neo4j is definitely super heavy per-query, you're right that it's a non-starter for you here. Bummer!!

Do you have any resources where I could read up on CDs? I didn't see any multiedges in the videos that I took a look at so far (except for possibly here https://youtu.be/TSTdyRDvSss?t=139), but maybe I'm not looking in the right place because I don't quite understand the underlying math..?

Or are they purely a property of the larger query graph? i.e., do you need multiedge support in both host and motif?

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252847400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56RDPHD3FYV4DVI5B3V7IKFRANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

j6k4m8 commented 2 years ago

Aha, I see — is this a requirement for other operations in networkx? (At least as far as grandiso is concerned, you can add edge attributes that are the same as node attributes)

Would it be helpful to do a zoom call or chat on slack / email or something? I'd love to make this useful for CD work :)

enjoysmath commented 2 years ago

Jordan,

Zoom would be fun. I could show you Abstract Spacecraft.

I'm not sure about your question. Networkx doesn't support it - they're just plain node-to-node graphs. But if you want to support mathematical software in general, you'll have to google around and maybe ask on math.stackexchange.com for what the most general graph-like visual would be. I'm pretty sure it's just node-nesting and supporting the other connectivity types though.

EnjoysMath

On Tue, Sep 20, 2022 at 1:24 PM Jordan Matelsky @.***> wrote:

Aha, I see — is this a requirement for other operations in networkx? (At least as far as grandiso is concerned, you can add edge attributes that are the same as node attributes)

Would it be helpful to do a zoom call or chat on slack / email or something? I'd love to make this useful for CD work :)

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252872428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56ORMVZ6A2MMKUP7KDV7IMP7ANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

j6k4m8 commented 2 years ago

Oops sorry — timing error :) I meant the edge attributes! I'll shoot you an email once my qualifying exam wraps up; let's chat!!

enjoysmath commented 2 years ago

https://us05web.zoom.us/j/86571448757?pwd=MEJFbkNacWc0UEhqcUlHQmFxY3Y2Zz09

On Tue, Sep 20, 2022 at 1:28 PM Jordan Matelsky @.***> wrote:

Oops sorry — timing error :) I meant the edge attributes! I'll shoot you an email once my qualifying exam wraps up; let's chat!!

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252876042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF55FFWAA2MPOILG7A3DV7IM5LANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

enjoysmath commented 1 year ago

Google equalizer diagram in category theory.

https://www.google.com/search?q=equalizer+diagram+category+theory&rlz=1C1RXQR_enUS1001US1001&hl=en&tbm=isch&sxsrf=ALiCzsYpxNxu_PeS4cWU6va_hTdbuJJxjQ:1663704586336&source=lnms&sa=X&ved=2ahUKEwi5v4THlqT6AhVvC0QIHcbQC6kQ_AUoAXoECAEQAw&biw=2133&bih=1041&dpr=0.9

Basically, if you have two parallel arrows between two nodes and they're in the same direction and the "diagram commutes", then they're actually equal arrows. Though I am dealing with CD's mainly, some diagrams are not commutative meaning between some two nodes A, B there exists separate paths of arrows P, Q such that they don't compose to the same arrow (are not equal).

Also in order to support general graph editing I will need MultiDiGraph support. It would be lame if I made this CD editor, but it didn't support drawing basic graphs too! Since CD's are essentially just DAGs. Though later, I will allow any Multi-graphs using a another definition of commutative diagram, which will involve DFA's / regexes. But usually a CD is defined as a functor D from a poset P into another category C, or D : P -> C. Because P is a poset category, there is one and only one arrow between any two objects and so a CD (you're right) is usually defined to also have a unique edge.

But I'm allowing non-CD's as well.

Also a graph library that supports both node to arrow, arrow to node, and arrow to arrow connections as well as the usual node to node connections, would be ideal for me because in math sometimes the arrows in a diagram are between arrows! Look up composition of natural morphisms.

Anyway, if you wanted to learn more Category Theory / Homological Algebra / Arrow Math / Commutative Diagram Chasing Proofs. I could recommend some books. Hopefully though my tool will be something you use in the future to help you learn these concepts. You could even gamify theorems with it eventually, but that's not high priority right now.

EnjoysMath

On Tue, Sep 20, 2022 at 1:04 PM Jordan Matelsky @.***> wrote:

I'm definitely interested, and I'd love to figure out a way to make this code useful for you. (PS: Is a Rust implementation interesting, or just C?)

I know that, but because of disconnectivity, the run times should come out about the same.

Gotcha — in that case, probably even a little better to put them in the same graph to avoid the overhead of the queries themselves!

Neo4j is definitely super heavy per-query, you're right that it's a non-starter for you here. Bummer!!

Do you have any resources where I could read up on CDs? I didn't see any multiedges in the videos that I took a look at so far (except for possibly here https://youtu.be/TSTdyRDvSss?t=139), but maybe I'm not looking in the right place because I don't quite understand the underlying math..?

Or are they purely a property of the larger query graph? i.e., do you need multiedge support in both host and motif?

— Reply to this email directly, view it on GitHub https://github.com/aplbrain/grandiso-networkx/issues/32#issuecomment-1252847400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56RDPHD3FYV4DVI5B3V7IKFRANCNFSM6AAAAAAQRJBU64 . You are receiving this because you were mentioned.Message ID: @.***>

yashpatel1994 commented 1 year ago

Hi @j6k4m8,

I was wondering whether you had any time to work on the issue of finding motifs in MultiDiGraph instances. From the discussion above, I do see that you recommended dotmotif as an alternative. Unfortunately, it does not suit my use case because both my motif and host are networkx objects.

Thanks!

Best, Zakuta

j6k4m8 commented 1 year ago

Hi @Zakuta! I haven't had a chance to dive into this implementation yet, though I appreciate you bumping this issue since it'll help me prioritize next steps!